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ASTHMA RELATED GENES 



Introduction 

Asthma is a disease of reversible bronchial obstruction, characterized by 
5 airway inflammation, epithelial damage, airway smooth muscle hypertrophy and 
bronchial hyperreactivity. Many asthma symptoms can be controlled by medical 
intervention, but incidence of asthma-related death and severe illness continue to 
rise in the United States. The approximately 4,800 deaths in 1989 marked a 46 
percent increase since 1980. As many as 12 million people in the United States 

10 have asthma, up 66 percent since 1980, and annually, the disease's medical and 
indirect costs are estimated at over $6 billion. 

Two common subdivisions of asthma are atopic (allergic, or extrinsic) asthma 
and non-atopic (intrinsic) asthma. Atopy is characterized by a predisposition to 
raise an IgE antibody response to common environmental antigens. In atopic 

15 asthma, asthma symptoms and evidence of allergy, such as a positive skin test to 
common allergens, are both present. Non-atopic asthma may be defined as 
reversible airflow limitation in the absence of allergies. 

The smooth muscle surrounding the bronchi are able to rapidly alter airway 
diameter in response to stimuli. When the response is excessive, it is termed 

20 bronchial hyperreactivity, a characteristic of asthma thought to have a heritable 
component. Studies have demonstrated a genetic predisposition to asthma by 
showing, for example, a greater concordance for this trait among monozygotic twins 
than among dizygotic twins. The genetics of asthma is complex, however, and 
shows no simple pattern of inheritance. Environment also plays a role in asthma 

25 development, for example, children of smokers are more likely to develop asthma 
than are children of non-smokers. 

In recent years thousands of human genes have been cloned. In many 
cases, gene discovery has been based on prior knowledge about the corresponding 
protein, such as amino acid sequence, immunological reactivity, etc. This approach 

30 has been very successful, but is limited in some important ways. One limitation is 
that genes in these cases are identified based on knowledge of molecular level 
protein properties. For a large number of important human genes, however, there 
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is little or no biochemical data concerning the encoded product. For example, 
genes that predispose to human diseases, such as cystic fibrosis, Huntington's 
disease, etc. are of interest because of their phenotypic effect. Biochemical 
characterization of such genes may be secondary to genetic characterization. 
5 A solution to this impasse has been found in combining classical genetic 

mapping with the ability to identify genes and, if necessary, to sequence large 
regions of chromosomes. Population and family studies enable genes associated 
with a trait of interest to be localized to a relatively small region of a chromosome. 
At this point, physical mapping can be used to identify candidate genes, and 

10 various molecular biology techniques used to pick out mutated genes in affected 
individuals. This "top-down" approach to gene discovery has been termed 
positional cloning, because genes are identified based on position in the genome. 

Positional cloning is now being applied to complex genetic diseases, which 
affect a greater fraction of humanity than do the more simple and usually rarer 

15 single gene disorders. Such studies must take into account the contribution of both 
environmental and genetic factors to the development of disease, and must allow 
for contributions to the genetic component by more than one, and potentially many, 
genes. The clinical importance of asthma makes it of considerable interest to 
characterize genes that underlie a genetic predisposition to this disease. Positional 

20 cloning provides an approach to this goal. 

Relevant Literature 

The symptoms and biology of asthma are reviewed in Chanez et a/. (1994) 

Odyssey 1 :24-33. A review of bronchial hyperreactivity may be found in Smith and 
25 McFadden (1995) Ann. Allergy. Asthma and Immunol. 74:454. Moss (1989) Annals 

of Allergy 63:566 review the allergic etiology and immunology of asthma. 

The genetic dissection of complex traits is discussed in Lander and Schork 

(1994) Science 265:2037-2048. Genetic mapping of candidate genes for atopy 

and/or bronchial hyperreactivity is described in Postma et a/. (1995) N.E.J.M. 
30 333:894; Marsh et ai (1994) Science 264:1 152; and Meyers et at. (1994) Genomics 

23:464. 
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Lawrence et ai (1994) Ann. Hum. Genet . 58:359 discuss an approach to the 
genetic analysis of atopy and asthma. Genetic linkage between the alpha subunit 
of the T cell receptor and IgE reactions has been noted by Moffat et ai (1994) Ihe 
Lancet 343:1597. Caraballo and Hernandez (1990) Tissue Antigens 35:182 noted 
5 an association between HLA alleles and allergic asthma. Evidence of linkage of 
atopy to markers on chromosome 11q has been seen in some British asthma 
families (Cookson etai (1989) Lance! i: 1292-1 295; Young etal. (1991) J. Med. 
Genet. 29:236, but not in other British families (Lympany et ai (1992) Clin. Exp. 
Allergy 22:1085-1092) or in families from Minnesota or Japan (Rich et ai (1992) 
10 Clin. Exp. Allergy 22:1070-1076; and Hizawa et ai (1992) Clin. Exp. Allergy 
22:1065). 

The association of a polymorphism for the FcsRI-p gene and risk of atopy is 
described in Hill etai (1995) BJVLL 311:776; Hill and Cookson (1996) Human Mol. 
Genet. 5:959; and Shirakawa etai (1994) Nature Genetics 7:125; an association of 
15 FceRI-p with bronchial hyperreactivity is described in van Herwerden (1995) Ihe 
Laoegt 346:1262. 

Collections of polymorphic markers from throughout the human genome 
have been tested for linkage to asthma, described in Meyers et ai (1996) Am. J. 
Hum. Genet . 59:A228 and Daniels et ai (1996) Nature 383:247-250. No linkage to 
20 human chromosome 1 1 p was detected in these studies. 

Summary of the Invention 
Human genes associated with a genetic predisposition to asthma are 
provided. The genes, herein termed ASTH1I and ASTH1J, are located close to 
25 each other on human chromosome 1 1 p, have similar patterns of expression, and 
common sequence motifs. The nucleic acid compositions are used to produce the 
encoded proteins, which may be employed for functional studies, as a therapeutic, 
and in studying associated physiological pathways. The nucleic acid compositions 
and antibodies specific for the protein are useful as diagnostics to identify a 
30 hereditary predisposition to asthma. 
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Brief Description of the Drawings ~ 
Figure 1 : Genomic organization of the ASTH1 1 and ASTMJ genes. The 
sizes of the exons are not to scale. Alternative exons are hatched. The direction of 
transcription is indicated below each gene. 

5 

Description of the Specific Embodiments 
The provided ASTH1 genes and fragments thereof, encoded protein, ASTH1 
genomic regulatory regions, and anti-ASTHI antibodies are useful in the 
identification of individuals predisposed to development of asthma, and for the 

10 modulation of gene activity in vivo for prophylactic and therapeutic purposes. The 
encoded ASTH1 protein is useful as an immunogen to raise specific antibodies, in 
drug screening for compositions that mimic or modulate ASTH1 activity or 
expression, including altered forms of ASTH1 protein, and as a therapeutic. 

Asthma, as defined herein, is reversible airflow limitation in a patient over a 

1 5 period of time. The disease is characterized by increased airway responsiveness to 
a variety of stimuli, and airway inflammation. A patient diagnosed as asthmatic will 
generally have multiple indications over time, including wheezing, asthmatic 
attacks, and a positive response to methacholine challenge, i.e. a PC 20 on 
methacholine challenge of less than about 4 mg/ml. Guidelines for diagnosis may 

20 be found in the National Asthma Education Program Expert Panel. Guidelines for 
diagnosis and management of asthma . National Institutes of Health, 1991; Pub. 
#91-3042. Atopy, respiratory infection and environmental predisposing factors may 
also be present, but are not necessary elements of an asthma diagnosis. Asthma 
conditions strictly related to atopy are referred to as atopic asthma. 

25 The human ASTH1I and ASTH1J gene sequences are provided, as are the 

genomic sequences 5' to ASTH1J. The major sequences of interest provided in the 
sequence listing are as follows: 

ASTH1J 5* Genomic Region DNA (SEQ ID NO:1) 

ASTH1J altl cDNA (SEQ ID NO:2) 

30 ASTH1J alt2 cDNA (SEQ ID NO:3) 

ASTH1J alt3 cDNA (SEQ ID NO:4) 
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The ASTH1 locus has been mapped to human chromosome 11 p. The traits 
for a positive response to methacholine challenge and a clinical history of asthma 

20 were shown to be genetically linked in a genome scan of the population of Tristan 
da Cunha, a single large extended family with a high incidence of asthma 
(discussed in Zamel et al. (1996) Am. J. Respir. Crit. Care Med. 153:1902-1906). 
The linkage finding was replicated in a set of Canadian asthmatic families. The 
region of strongest linkage was the marker D1 1S907 on the short arm of 

25 chromosome 11. Additional markers were identified from the four megabase region 
surrounding D1 1S907 from public databases and by original cloning of new 
polymorphic microsatellite markers. Refinement of the region of interest was 
obtained by genotyping new markers in the studied populations, and applying the 
transmission disequilibrium test (TDT), which reflects the level of association 

30 between marker alleles and disease status. TDT curves were superimposed on the 
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physical map. Molecular genetic techniques for gene identification were applied to 
the region of interest. A one megabase genomic region was sequenced to high 
accuracy, and the resulting data used for the sequence-based prediction of genes 
and determination of the intron/exon structure of genes in the region. 
5 Nucleic Acid Compositions 

ASTH1I produces a 2.8 kb mRNA expressed at high levels in trachea and 
prostate, and at lower levels in lung and kidney and possibly other tissues. ASTH1I 
cDNA clones have also been identified in prostate, testis and lung libraries. 
Sequence polymorphisms are shown in Table 3. ASTH1I has at least three 
10 alternate forms denoted as altl , alt2, and alt3. The alternative splicing and start 
codons give the three forms of ASTH1I proteins different amino termini. The 
ASTH1 1 proteins, altl, alt2 and alt3 are 265, 255 and 164 amino acids in length, 
respectively. 

A domain of the ASTH1I and ASTH1J proteins is similar in sequence to 
15 transcription factors of the ets family. The ets family is a group of transcription 
factors that activate genes involved in a variety of immunological and other 
processes. The family members most similar to ASTH1 1 and ASTH1 J are: ETS1 , 
ETS2, ESX, ELF, ELK1, TEL, NET, SAP-1, NERF and FLI. The ASTH1I and 
ASTH1J proteins show similarity to each other. Over the ets domain they are 66% 
20 similar (/e. have amino acids with similar properties in the same positions) and 46% 
identical to each other. All forms of ASTH1 1 and ASTH1J have a helix turn helix 
motif, characteristic of some transcription factors, located near the carboxy terminal 
end of the protein. 

ASTH1J produces an approximately 6 kb mRNA expressed at high levels in 
25 the trachea, prostate and pancreas and at lower levels in colon, small intestine, lung 
and stomach. ASTH1J has at least three forms, consisting of the altl , alt2 and alt3 
forms. The open reading frame is identical for the three forms, which differ only in 
the 5' UTR. The protein encoded by ASTH1 J is 300 amino acids in length. 

Mouse coding region sequence of asth1j \s provided in SEQ ID NO:326, and 
30 the amino acid sequence is provided in SEQ ID NO:327. The mous and human 
proteins have 88.4% identity throughout their length. The match in the ets 
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domain is 100%. The mouse cDNA was identified by hybridization of a full-length 
human cDNA to a mouse lung cDNA library (Stratagene). 

The term "ASTH1 genes" is herein used generically to designate ASTH1I 
and ASTH1J genes and their alternate forms. The two genes lie in opposite 
5 orientations on a native chromosome, with the 5' regulatory sequences between 
them. Part of the genomic sequence between the two coding regions is provided as 
SEQ ID NO:1. The term "ASTH1 locus" is used herein to refer to the two genes in 
all alternate forms and the genomic sequence that lies between the two genes. 
Alternate forms include splicing variants, and polymorphisms in the sequence. 

10 Specific polymorphic sequences are provided in SEQ ID NOs:16-159. For some 
purposes the previously known EST sequences described herein may be excluded 
from the sequences defined as the ASTH1 locus. 

The DNA sequence encoding ASTH1 may be cDNA or genomic DNA or a 
fragment thereof. The term U ASTH1 gene" shall be intended to mean the open 

15 reading frame encoding specific ASTH1 polypeptides, introns, as well as adjacent 5* 
and 3' non-coding nucleotide sequences involved in the regulation of expression, 
up to about 1 kb beyond the coding region, but possibly further in either direction. 
The gene may be introduced into an appropriate vector for extrachromosomal 
maintenance or for integration into the host. 

20 The term "cDNA" as used herein is intended to include all nucleic acids that 

share the arrangement of sequence elements found in native mature mRNA 
species, where sequence elements are exons and 3' and 5' non-coding regions. 
Normally mRNA species have contiguous exons, with the intervening introns 
removed by nuclear RNA splicing, to create a continuous open reading frame 

25 encoding the ASTH1 protein. 

The genomic ASTH1 sequence has non-contiguous open reading frames, 
where introns interrupt the protein coding regions. A genomic sequence of interest 
comprises the nucleic acid present between the initiation codon and the stop codon, 
as defined in the listed sequences, including all of the introns that are normally 

30 present in a native chromosome. It may further include the 3* and 5' untranslated 
regions found in the mature mRNA. It may further include specific transcriptional 
and translational regulatory sequences, such as promoters, enhancers, efc, 
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including about 1 kb, but possibly more, of flanking genomic DNA at either the 5' or 
3' end of the transcribed region. The genomic DNA may be isolated as a fragment 
of 100 kbp or smaller; and substantially free of flanking chromosomal sequence. 
Genomic regions of interest include the non-transcribed sequences 5' to 
5 ASTH1J, as provided in SEQ ID NO:1. This region of DNA contains the native 
promoter elements that direct expression of the linked ASTH1J gene. Usually a 
promoter region will have at least about 140 nt of sequence located 5' to the ASTH1 
gene and further comprising a TATA box and CAAT box motif sequence (SEQ ID 
NO: 14, nt. 597-736). The promoter region may further comprise a consensus ets 

10 binding motif, (C/A)GGA(A/T) (SEQ ID NO:14, nt 1-5). A region of particular 

interest, containing the ets binding motif, TATA box and CAAT box motifs 5* to the 
ASTH1J gene, is provided in SEQ ID NO:14. The position of SEQ ID NO:14 within 
the larger sequence is SEQ ID NO:1, nt 60359-61095. The promoter sequence 
may comprise polymorphisms within the CAAT box region, for example those 

15 shown in SEQ ID NO:12 and SEQ ID NO:13, which have been shown to affect the 
function of the promoter. The promoter region of interest may extend 5* to SEQ ID 
NO:14 within the larger sequence, e.g. SEQ ID NO:1, nt 59000-61095; SEQ ID 
NO:1, nt 5700-61095, eto. 

The sequence of this 5' region, and further 5' upstream sequences and 3* 

20 downstream sequences, may be utilized for promoter elements, including enhancer 
binding sites, that provide for expression in tissues where ASTH1J is expressed. 
The tissue specific expression is useful for determining the pattern of expression, 
and for providing promoters that mimic the native pattern of expression. Naturally 
occurring polymorphisms in the promoter region are useful for determining natural 

25 variations in expression, particularly those that may be associated with disease. 
See, for example, SEQ ID NO:12 and 13. Alternatively, mutations may be 
introduced into the promoter region to determine the effect of altering expression in 
experimentally defined systems. Methods for the identification of specific DNA 
motifs involved in the binding of transcriptional factors are known in the art, e.g. 

30 sequence similarity to known binding motifs, gel retardation studies, efc. For 
examples, see Blackwell et a/. (1995) Mol Med 1: 194-205; Mortlock et ai (1996) 
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Genome Res. 6: 327-33; and Joulin and Richard-Foy (1995) Eur J Biochem 232: 
620-626. 

The regulatory sequences may be used to identify cis acting sequences 
required for transcriptional or translational regulation of ASTH1 expression, 
5 especially in different tissues or stages of development, and to identify cis acting 
sequences and trans acting factors that regulate or mediate ASTH1 expression. 
Such transcription or translational control regions may be operably linked to a 
ASTH1 gene in order to promote expression of wild type or altered ASTH1 or other 
proteins of interest in cultured cells, or in embryonic, fetal or adult tissues, and for 

10 gene therapy. 

The nucleic acid compositions of the subject invention may encode all or a 
part of the subject polypeptides. Fragments may be obtained of the DNA sequence 
by chemically synthesizing oligonucleotides in accordance with conventional 
methods, by restriction enzyme digestion, by PCR amplification, etc. For the most 

15 part, DNA fragments will be of at least 15 nt, usually at least 18 nt, more usually at 
least about 50 nt. Such small DNA fragments are useful as primers for PCR, 
hybridization screening, etc. Larger DNA fragments, i.e. greater than 100 nt are 
useful for production of the encoded polypeptide. For use in amplification reactions, 
such as PCR, a pair of primers will be used. The exact composition of the primer 

20 sequences is not critical to the invention, but for most applications the primers will 
hybridize to the subject sequence under stringent conditions, as known in the art. It 
is preferable to choose a pair of primers that will generate an amplification product 
of at least about 50 nt, preferably at least about 100 nt. Algorithms for the selection 
of primer sequences are generally known, and are available in commercial software 

25 packages. Amplification primers hybridize to complementary strands of DNA, and 
will prime towards each other. 

The ASTH1 genes are isolated and obtained in substantial purity, generally 
as other than an intact mammalian chromosome. Usually, the DNA will be obtained 
substantially free of other nucleic acid sequences that do not include an ASTH1 

30 sequence or fragment thereof, generally being at least about 50%, usually at least 
about 90% pure and are typically "recombinant", i.e. flanked by one or more 
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nucleotides with which it is not normally associated on a naturally occurring 
chromosome. 

The DNA sequences are used in a variety of ways. They may be used as 
probes for identifying ASTH1 related genes. Mammalian homologs have 
5 substantial sequence similarity to the subject sequences, Le. at least 75%, usually 
at least 90%, more usually at least 95% sequence identity with the nucleotide 
sequence of the subject DNA sequence. Sequence similarity is calculated based 
on a reference sequence, which may be a subset of a larger sequence, such as a 
conserved motif, coding region, flanking region, etc. A reference sequence will 

10 usually be at least about 18 nt long, more usually at least about 30 nt long, and may 
extend to the complete sequence that is being compared. Algorithms for sequence 
analysis are known in the art, such as BLAST, described in Altschul etal. (1990) J 
Mol Biol 215:403-10. 

Nucleic acids having sequence similarity are detected by hybridization under 

15 low stringency conditions, for example, at 50°C and 10XSSC (0.9 M saline/0.09 M 
sodium citrate) and remain bound when subjected to washing at 55°C in 1XSSC. 
Sequence identity may be determined by hybridization under stringent conditions, 
for example, at 50°C or higher and 0.1XSSC (9 mM saline/0.9 mM sodium citrate). 
By using probes, particularly labeled probes of DNA sequences, one can isolate 

20 homologous or related genes. The source of homologous genes may be any 

species, e.g. primate species, particularly human; rodents, such as rats and mice, 
canines, felines, bovines, ovines, equines, yeast, Drosophila, Caenhorabditis, etc. 

The DNA may also be used to identify expression of the gene in a biological 
specimen. The manner in which one probes cells for the presence of particular 

25 nucleotide sequences, as genomic DNA or RNA, is well established in the literature 
and does not require elaboration here. mRNA is isolated from a cell sample. 
mRNA may be amplified by RT-PCR, using reverse transcriptase to form a 
complementary DNA strand, followed by polymerase chain reaction amplification 
using primers specific for the subject DNA sequences. Alternatively, mRNA sample 

30 is separated by gel electrophoresis, transferred to a suitable support, e.g. 

nitrocellulose, nylon, ete., and then probed with a fragment of the subject DNA as a 
probe. Other techniques, such as oligonucleotide ligation assays, in situ 

-10- 
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hybridizations, and hybridization to DNA probes arrayed on a solid chip may also 
find use. Detection of mRNA hybridizing to the subject sequence is indicative of 
ASTH1 gene expression in the sample. 

The subject nucleic acid sequences may be modified for a number of 
5 purposes, particularly where they will be used intracellular^, for example, by being 
joined to a nucleic acid cleaving agent, e.g. a chelated metal ion, such as iron or 
chromium for cleavage of the gene; or the like. 

The sequence of the ASTH1 locus, including flanking promoter regions and 
coding regions, may be mutated in various ways known in the art to generate 

10 targeted changes in promoter strength, sequence of the encoded protein, etc. The 
DNA sequence or product of such a mutation will be substantially similar to the 
sequences provided herein, Le. will differ by at least one nucleotide or amino acid, 
respectively, and may differ by at least two but not more than about ten nucleotides 
or amino acids. The sequence changes may be substitutions, insertions or 

15 deletions. Deletions may further include larger changes, such as deletions of a 
domain or exon. Other modifications of interest include epitope tagging, e.g. with 
the FLAG system, HA, efc. For studies of subcellular localization, fusion proteins 
with green fluorescent proteins (GFP) may be used. Such mutated genes may be 
used to study structure-function relationships of ASTH1 polypeptides, or to alter 

20 properties of the protein that affect its function or regulation. For example, 

constitutively active transcription factors, or a dominant negatively active protein 
that binds to the ASTH1 DNA target site without activating transcription, may be 
created in this manner. 

Techniques for in vitro mutagenesis of cloned genes are known. Examples 

25 of protocols for scanning mutations may be found in Gustin et a/., Biotechniques 
14:22 (1993); Barany, Gene 37:111-23 (1985); Colicelli et a/., Mol Gen Genet 
199:537-9 (1985); and Prentki et a/., Gene 29:303-13 (1984). Methods for site 
specific mutagenesis can be found in Sambrook et a/., Molecular Cloning: A 
Laboratory Manual, CSH Press 1989, pp. 15.3-15.108; Weiner et al., Gene 126:35- 

30 41 (1993); Sayers et al., Biotechniques 13:592-6 (1992); Jones and Winistorfer, 
Biotechniques 12:528-30 (1992); Barton et a/., Nucleic Acids Res 18:7349-55 
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(1990); Marotti and Tomich, Gene Anal Tech 6:67-70 (1989); and Zhu Anal 
Biochem 177:120-4 (1989). 

Synthesis of ASTH1 Proteins 
The subject gene may be employed for synthesis of a complete ASTH1 
5 protein, or polypeptide fragments thereof, particularly fragments corresponding to 
functional domains; binding sites; etc/, and including fusions of the subject 
polypeptides to other proteins or parts thereof. For expression, an expression 
cassette may be employed, providing for a transcriptional and translational initiation 
region, which may be inducible or constitutive, where the coding region is operably 

10 linked under the transcriptional control of the transcriptional initiation region, and a 
transcriptional and translational termination region. Various transcriptional initiation 
regions may be employed that are functional in the expression host. 

The polypeptides may be expressed in prokaryotes or eukaryotes in 
accordance with conventional ways, depending upon the purpose for expression. 

15 For large scale production of the protein, a unicellular organism, such as E. coli, B. 
subtilis, S. cerevisiae, or cells of a higher organism such as vertebrates, particularly 
mammals, e.g. COS 7 cells, may be used as the expression host cells. In many 
situations, it may be desirable to express the ASTH1 gene in mammalian cells, 
where the ASTH1 gene will benefit from native folding and post-translational 

20 modifications. Small peptides can also be synthesized in the laboratory. 

With the availability of the polypeptides in large amounts, by employing an 
expression host, the polypeptides may be isolated and purified in accordance with 
conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 

25 chromatography, or other purification technique. The purified polypeptide will 

generally be at least about 80% pure, preferably at least about 90% pure, and may 
be up to and including 100% pure. Pure is intended to mean free of other proteins, 
as well as cellular debris. 

The polypeptide is used for the production of antibodies, where short 

30 fragments provide for antibodies specific for the particular polypeptide, and larger 
fragments or the entire protein allow for the production of antibodies over the 
surface of the polypeptide. Antibodies may be raised to the wild-type or variant 
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forms of ASTH1. Antibodies may be raised to isolated peptides corresponding to 
these domains, or to the native protein, e.g. by immunization with cells expressing 
ASTH1, immunization with liposomes having ASTH1 inserted in the membrane, etc. 
Antibodies are prepared in accordance with conventional ways, where the 
5 expressed polypeptide or protein is used as an immunogen, by itself or conjugated 
to known immunogenic carriers, e.g. KLH, pre-S HBsAg, other viral or eukaryotic 
proteins, or the like. Various adjuvants may be employed, with a series of 
injections, as appropriate. For monoclonal antibodies, after one or more booster 
injections, the spleen is isolated, the lymphocytes immortalized by cell fusion, and 

10 then screened for high affinity antibody binding. The immortalized cells, i.e. 

hybridomas, producing the desired antibodies may then be expanded. For further 
description, see Monoclonal Antibodies: A Laboratory Manual . Harlow and Lane 
eds M Cold Spring Harbor Laboratories, Cold Spring Harbor, New York, 1988. If 
desired, the mRNA encoding the heavy and light chains may be isolated and 

15 mutagenized by cloning in E. co//, and the heavy and light chains mixed to further 
enhance the affinity of the antibody. Alternatives to in vivo immunization as a 
method of raising antibodies include binding to phage "display" libraries, usually in 
conjunction with in vitro affinity maturation. 

Detection of ASTH1 Associated Asthma 

20 Diagnosis of ASTH1 associated asthma is performed by protein, DNA or 

RNA sequence and/or hybridization analysis of any convenient sample from a 
patient, e.g. biopsy material, blood sample, scrapings from cheek, etc. A nucleic 
acid sample from a patient having asthma that may be associated with ASTH1, is 
analyzed for the presence of a predisposing polymorphism in ASTH1. A typical 

25 patient genotype will have at least one predisposing mutation on at least one 
chromosome. The presence of a polymorphic ASTH1 sequence that affects the 
activity or expression of the gene product, and confers an increased susceptibility to 
asthma is considered a predisposing polymorphism. Individuals are screened by 
analyzing their DNA or mRNA for the presence of a predisposing polymorphism, as 

30 compared to an asthma neutral sequence. Specific sequences of interest include 
any polymorphism that leads to clinical bronchial hyperreactivity or is otherwise 
associated with asthma, including, but not limited to, insertions, substitutions and 

-13- 



WO 99/37809 PCT/US98/01 260 

deletions in the coding region sequence, intron sequences that affect splicing, or 
promoter or enhancer sequences that affect the activity and expression of the 
protein. Examples of specific ASTH1 polymorphisms in asthma patients are listed 
in Tables 3-8. 

5 The CAAT box polymorphism of SEQ ID NO: 12 and 13 (which is located 

within SEQ ID NO:14) is of particular interest. The "G" form, SEQ ID NO:13, can be 
associated with a propensity to develop bronchial hyperreactivity or asthma. Other 
polymorphisms in the surrounding region affect this association. It has been found 
that substitution of "G w for M A M results in decreased binding of nuclear proteins to the 
10 DNA motif. 

The effect of an ASTH1 predisposing polymorphism may be modulated by 
the patient genotype in other genes related to asthma and atopy, including, but not 
limited to, the Fee receptor, Class I and Class II HLA antigens, T cell receptor and 
immunoglobulin genes, cytokines and cytokine receptors, and the like. 
15 Screening may also be based on the functional or antigenic characteristics of 

the protein. Immunoassays designed to detect predisposing polymorphisms in 
ASTH1 proteins may be used in screening. Where many diverse mutations lead to 
a particular disease phenotype, functional protein assays have proven to be 

* 

effective screening tools. 
20 Biochemical studies may be performed to determine whether a candidate 

sequence polymorphism in the ASTH1 coding region or control regions is 

associated with disease. For example, a change in the promoter or enhancer 

sequence that affects expression ofASTHI may result in predisposition to asthma. 

Expression levels of a candidate variant allele are compared to expression levels of 
25 the normal allele by various methods known in the art. Methods for determining 

promoter or enhancer strength include quantitation of the expressed natural protein; 

insertion of the variant control element into a vector with a reporter gene such as 

* 

p-galactosidase, luciferase, chloramphenicol acetyltransferase, etc. that provides 
for convenient quantitation; and the like. The activity of the encoded ASTH1 protein 
30 may be determined by comparison with the wild-type protein. 
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A number of methods are available for analyzing nucleic acids for the 
presence of a specific sequence. Where large amounts of DNA are available, 
genomic DNA is used directly. Alternatively, the region of interest is cloned into a 
suitable vector and grown in sufficient quantity for analysis. Cells that express 
5 ASTH1 genes, such as trachea cells, may be used as a source of mRNA, which 
may be assayed directly or reverse transcribed into cDNA for analysis. The nucleic 
acid may be amplified by conventional techniques, such as the polymerase chain 
reaction (PCR), to provide sufficient amounts for analysis. The use of the 
polymerase chain reaction is described in Saiki, et ai (1985) Science 239:487, and 

10 a review of current techniques may be found in Sambrook, et ai Molecular Cloning: 
A Laboratory Manual . CSH Press 1989, pp. 14.2-14. 33. Amplification may also be 
used to determine whether a polymorphism is present, by using a primer that is 
specific for the polymorphism. Alternatively, various methods are known in the art 
that utilize oligonucleotide ligation as a means of detecting polymorphisms, for 

15 examples see Riley et ai (1990) N.A.R. 18:2887-2890; and Delahunty et ai (1996) 
Am. J. Hum. Genet 58:1239-1246. 

A detectable label may be included in an amplification reaction. Suitable 
labels include fluorochromes, e.g. fluorescein isothiocyanate (FITC), rhodamine, 
Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 

20 2 , ,7 , -dimethoxy-4 , l 5'-dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine 
(ROX), e-carboxy^'^'J'^J-hexachlorofluorescein (HEX), 5-carboxyfluorescein 
(5-FAM) or N.N.N'.N'-tetramethyl-e-carboxyrhodamine (TAMRA), radioactive labels, 
e.g. 32 P, 35 S f 3 H; etc. The label may be a two stage system, where the amplified 
DNA is conjugated to biotin, haptens, etc. having a high affinity binding partner, e.g. 

25 avidin, specific antibodies, ete., where the binding partner is conjugated to a 
detectable label. The label may be conjugated to one or both of the primers. 
Alternatively, the pool of nucleotides used in the amplification is labeled, so as to 
incorporate the label into the amplification product. 

The sample nucleic acid, e.g. amplified or cloned fragment, is analyzed by 

30 one of a number of methods known in the art. The nucleic acid may be sequenced 
by dideoxy or other methods, and the sequence of bases compared to a neutral 
ASTH1 sequence. Hybridization with the variant sequence may also be used to 
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determine its presence, by Southern blots, dot blots, etc. The hybridization pattern 
of a control and variant sequence to an array of oligonucleotide probes immobilised 
on a solid support, as described in US 5,445,934, or in WO95/35505, may also be 
used as a means of detecting the presence of variant sequences. Single strand 
5 conformational polymorphism (SSCP) analysis, denaturing gradient gel 

electrophoresis (DGGE), mismatch cleavage detection, and heteroduplex analysis 
in gel matrices are used to detect conformational changes created by DNA 
sequence variation as alterations in electrophoretic mobility. Alternatively, where a 
polymorphism creates or destroys a recognition site for a restriction endonuclease 

10 (restriction fragment length polymorphism, RFLP), the sample is digested with that 
endonuclease, and the products size fractionated to determine whether the 
fragment was digested. Fractionation is performed by gel or capillary 
electrophoresis, particularly acrylamide or agarose gels. 

The hybridization pattern of a control and variant sequence to an array of 

15 oligonucleotide probes immobilised on a solid support, as described in US 

5,445,934, or in WO95/35505, may be used as a means of detecting the presence 
of variant sequences. In one embodiment of the invention, an array of 
oligonucleotides are provided, where discrete positions on the array are 
complementary to at least a portion of mRNA or genomic DNA of the ASTH1 locus. 

20 Such an array may comprise a series of oligonucleotides, each of which can 

specifically hybridize to a nucleic acid, e.g. mRNA, cDNA, genomic DNA, etc. from 
the ASTH1 locus. 

An array may include all or a subset of the polymorphisms listed in Table 3 
(SEQ ID NOs:16-126). One or both polymorphic forms may be present in the array, 

25 for example the polymorphism of SEQ ID NO: 12 and 13 may be represented by 
either, or both, of the listed sequences. Usually such an array will include at least 2 
different polymorphic sequences, i.e. polymorphisms located at unique positions 
within the locus, usually at least about 5, more usually at least about 10, and may 
include as many as 50 to 100 different polymorphisms. The oligonucleotide 

30 sequence on the array will usually be at least about 12 nt in length, may be the 
length of the provided polymorphic sequences, or may extend into the flanking 
regions to generate fragments of 100 to 200 nt in length. For examples of arrays, 
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see Hacia et al. (1996) Nature Genetics 14:441-447; Lockhart ef a/. (1996) Nature 
Biotechnol . 14:1675-1680; and De Risi et ai (1996) Nature Genetics 14:457-460. 

Antibodies specific for ASTH1 polymorphisms may be used in screening 
immunoassays. A reduction or increase in neutral ASTH1 and/or presence of 
5 asthma associated polymorphisms is indicative that asthma is ASTH1 -associated. 
A sample is taken from a patient suspected of having ASTH1 -associated asthma. 
Samples, as used herein, include biological fluids such as tracheal lavage, blood, 
cerebrospinal fluid, tears, saliva, lymph, dialysis fluid and the like; organ or tissue 
culture derived fluids; and fluids extracted from physiological tissues. Also included 

10 in the term are derivatives and fractions of such fluids. Biopsy samples are of 

particular interest, e.g. trachea scrapings, etc. The number of cells in a sample will 
generally be at least about 10 3 , usually at least 10 4 more usually at least about 10 5 . 
The cells may be dissociated, in the case of solid tissues, or tissue sections may be 
analyzed. Alternatively a lysate of the cells may be prepared. 

15 Diagnosis may be performed by a number of methods. The different 

methods all determine the absence or presence or altered amounts of normal or 
abnormal ASTH1 in patient cells suspected of having a predisposing polymorphism 
in ASTH1. For example, detection may utilize staining of cells or histological 
sections, performed in accordance with conventional methods. The antibodies of 

20 interest are added to the cell sample, and incubated for a period of time sufficient to 
allow binding to the epitope, usually at least about 10 minutes. The antibody may 
be labeled with radioisotopes, enzymes, fluoresces, chemiluminescers, or other 
labels for direct detection. Alternatively, a second stage antibody or reagent is used 
to amplify the signal. Such reagents are well known in the art. For example, the 

25 primary antibody may be conjugated to biotin, with horseradish peroxidase- 
conjugated avidin added as a second stage reagent. Final detection uses a 
substrate that undergoes a color change in the presence of the peroxidase. The 
absence or presence of antibody binding may be determined by various methods, 
including flow cytometry of dissociated cells, microscopy, radiography, scintillation 

30 counting, etc. 

An alternative method for diagnosis depends on the in vitro detection of 
binding between antibodies and ASTH1 in a lysate. Measuring the concentration of 
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ASTH1 binding in a sample or fraction thereof may be accomplished by a variety of 
specific assays. A conventional sandwich type assay may be used. For example, a 
sandwich assay may first attach ASTH1 -specific antibodies to an insoluble surface 
or support. The particular manner of binding is not crucial so long as it is 
5 compatible with the reagents and overall methods of the invention. They may be 
bound to the plates covalently or non-covalently, preferably non-covalently. 

The insoluble supports may be any compositions to which polypeptides can 
be bound, which is readily separated from soluble material, and which is otherwise 
compatible with the overall method. The surface of such supports may be solid or 

1 0 porous and of any convenient shape. Examples of suitable insoluble supports to 
which the receptor is bound include beads, e.g. magnetic beads, membranes and 
microtiter plates. These are typically made of glass, plastic (e.g. polystyrene), 
polysaccharides, nylon or nitrocellulose. Microtiter plates are especially convenient 
because a large number of assays can be carried out simultaneously, using small 

1 5 amounts of reagents and samples. 

Patient sample lysates are then added to separately assayable supports (for 
example, separate wells of a microtiter plate) containing antibodies. Preferably, a 
series of standards, containing known concentrations of normal and/or abnormal 
ASTH1 is assayed in parallel with the samples or aliquots thereof to serve as 

20 controls. Preferably, each sample and standard will be added to multiple wells so 
that mean values can be obtained for each. The incubation time should be 
sufficient for binding, generally, from about 0.1 to 3 hr is sufficient. After incubation, 
the insoluble support is generally washed of non-bound components. Generally, a 
dilute non-ionic detergent medium at an appropriate pH, generally 7-8, is used as a 

25 wash medium. From one to six washes may be employed, with sufficient volume to 
thoroughly wash non-specifically bound proteins present in the sample. 

After washing, a solution containing a second antibody is applied. The 
antibody will bind ASTH1 with sufficient specificity such that it can be distinguished 
from other components present. The second antibodies may be labeled to facilitate 

30 direct, or indirect quantification of binding. Examples of labels that permit direct 
measurement of second receptor binding include radiolabels, such as 3 H or 125 l, 
fluoresces, dyes, beads, chemilumninescers, colloidal particles, and the like. 
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Examples of labels which permit indirect measurem nt of binding include enzymes 
where the substrate may provide for a colored or fluorescent product. In a preferred 
embodiment, the antibodies are labeled with a covalently bound enzyme capable of 
providing a detectable product signal after addition of suitable substrate. Examples 
of suitable enzymes for use in conjugates include horseradish peroxidase, alkaline 
phosphatase, malate dehydrogenase and the like. Where not commercially 
available, such antibody-enzyme conjugates are readily produced by techniques 
known to those skilled in the art. The incubation time should be sufficient for the 
labeled ligand to bind available molecules. Generally, from about 0.1 to 3 hr is 
sufficient, usually 1 hr sufficing. 

After the second binding step, the insoluble support is again washed free of 
non-specifically bound material. The signal produced by the bound conjugate is 
detected by conventional means. Where an enzyme conjugate is used, an 
appropriate enzyme substrate is provided so a detectable product is formed. 

Other immunoassays are known in the art and may find use as diagnostics. 
Ouchterlony plates provide a simple determination of antibody binding. Western 
blots may be performed on protein gels or protein spots on filters, using a detection 
system specific for ASTH1 as desired, conveniently using a labeling method as 
described for the sandwich assay. 

Other diagnostic assays of interest are based on the functional properties of 
ASTH1 proteins. Such assays are particularly useful where a large number of 
different sequence changes lead to a common phenotype, i.e. altered protein 
function leading to bronchial hyperreactivity. For example, a functional assay may 
be based on the transcriptional changes mediated by ASTH1 gene products. Other 
assays may, for example, detect conformational changes, size changes resulting 
from insertions, deletions or truncations, or changes in the subcellular localization of 
ASTH1 proteins. 

In a protein truncation test, PCR fragments amplified from the ASTH1 gene 
or its transcript are used as templates for in vivo transcription/translation reactions 
to generate protein products. Separation by gel electrophoresis is performed to 
determine whether the polymorphic gene encodes a truncated protein, where 
truncations may be associated with a loss of function. 
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Diagnostic screening may also be performed for polymorphisms that are 
genetically linked to a predisposition for bronchial hyperreactivity, particularly 
through the use of microsatellite markers or single nucleotide polymorphisms. 
Frequently the microsatellite polymorphism itself is not phenotypically expressed, 
5 but is linked to sequences that result in a disease predisposition. However, in some 
cases the microsatellite sequence itself may affect gene expression. Microsatellite 
linkage analysis may be performed alone, or in combination with direct detection of 
polymorphisms, as described above. The use of microsatellite markers for 
genotyping is well documented. For examples, see Mansfield et a/. (1994) 
10 Genomics 24:225-233; Ziegle et al. (1992) Genomics 14:1026-1031; Dib et ai, 
supra. 

Microsatellite loci that are useful in the subject methods have the general 
formula: 

U (R) n U\ where 

15 U and IT are non-repetitive flanking sequences that uniquely identify the particular 
locus, R is a repeat motif, and n is the number of repeats. The repeat motif is at 
least 2 nucleotides in length, up to 7, usually 2-4 nucleotides in length. Repeats 
can be simple or complex. The flanking sequences U and IT uniquely identify the 
microsatellite locus within the human genome. U and LT are at least about 18 

20 nucleotides in length, and may extend several hundred bases up to about 1 kb on 
either side of the repeat. Within U and U\ sequences are selected for amplification 
primers. The exact composition of the primer sequences are not critical to the 
invention, but they must hybridize to the flanking sequences U and U\ respectively, 
under stringent conditions. Criteria for selection of amplification primers are as 

25 previously discussed. To maximize the resolution of size differences at the locus, it 
is preferable to chose a primer sequence that is close to the repeat sequence, such 
that the total amplification product is between 100-500 nucleotides in length. 

The number of repeats at a specific locus, n, is polymorphic in a population, 
thereby generating individual differences in the length of DNA that lies between the 

30 amplification primers. The number will vary from at least 1 repeat to as many as 
about 100 repeats or more. 
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The primers are used to amplify the region of genomic DNA that contains the 
repeats. Conveniently, a detectable label will be included in the amplification 
reaction, as previously described. Multiplex amplification may be performed in 
which several sets of primers are combined in the same reaction tube. This is 
particularly advantageous when limited amounts of sample DNA are available for 
analysis. Conveniently, each of the sets of primers is labeled with a different 
fluorochrome. 

After amplification, the products are size fractionated. Fractionation may be 
performed by gel electrophoresis, particularly denaturing acrylamide or agarose 
gels. A convenient system uses denaturing polyacrylamide gels in combination with 
an automated DNA sequencer, see Hunkapillar et al. (1991) Science 254:59-74. 
The automated sequencer is particularly useful with multiplex amplification or 
pooled products of separate PCR reactions. Capillary electrophoresis may also be 
used for fractionation. A review of capillary electrophoresis may be found in 
Landers, et al. (1993) BioTechnigues 14:98-1 11. The size of the amplification 
product is proportional to the number of repeats (n) that are present at the locus 
specified by the primers. The size will be polymorphic in the population, and is 
therefore an allelic marker for that locus. 

A number of markers in the region of the ASTH1 locus have been identified, 
and are listed in Table 1 in the Experimental section (SEQ ID NOs: 160-273). Of 
particular interest for diagnostic purposes is the marker D1 1S2008, in which 
individuals having alleles C or F at this locus, particularly in combination with the 
CAAT box polymorphism and other polymorphisms, are predisposed to develop 
bronchial hyperreactivity or asthma. The association of D1 1S2008 alleles is as 
follows: 

Allele Association with asthma Number of TATC repeats relative to allele C 



A 
B 
C 
D 
E 
F 
G 
H 



(SEQ ID NO: 15) 



no 
no 
yes 
no 



equivalent 



+1 
+2 
+3 
+4 
+5 



-2 
-1 



no 



yes 
no 



no 
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A DNA sequence of interest for diagnosis comprises the D1 1S2008 primer 
sequences shown in Table 1 (SEQ ID NO:242 and 243), flanking one or three 
repeats of SEQ ID NO: 15. 

Other microsatellite markers of interest for diagnostic purposes are CA39_2; 
5 774F; 774J; 7740; L19PENTA1; 65P14TE1; AFM205YG5; D11S907; D11S4200; 
774N; CA11-11; 774L; AFM283WH9; ASMI14 and D11S1900 (primer sequences 
are provided in Table 1, the repeats are provided in Table 1B). 

Regulation of ASTH1 Expression 
The ASTH1 genes are useful for analysis of ASTH1 expression, e.g. in 

10 determining developmental and tissue specific patterns of expression, and for 
modulating expression in vitro and in vivo. The regulatory region of SEQ ID NO:1 
may also be used to investigate analysis o1ASTH1 expression. Vectors useful for 
introduction of the gene include plasmids and viral vectors. Of particular interest 
are retroviral-based vectors, e.g. Moloney murine leukemia virus and modified 

15 human immunodeficiency virus; adenovirus vectors, etc. that are maintained 

transiently or stably in mammalian cells. A wide variety of vectors can be employed 
for transfection and/or integration of the gene into the genome of the cells. 
Alternatively, micro-injection may be employed, fusion, or the like for introduction of 
genes into a suitable host cell. See, for example, Dhawan ef a/. (1991) Science 

20 254:1509-1512 and Smith etal. (1990) Molecular and Cellular Biology 3268-3271. 

Administration of vectors to the lungs is of particular interest. Frequently 
such methods utilize liposomal formulations, as described in Eastman etal. (1997) 
Hum Gene Ther 8:765-773; Oudrhiri et al. (1997) P.N.A.S. 94:1651-1656; 
McDonald et al. (1997) Hum Gene Ther 8:41 1-422. 

25 The expression vector will have a transcriptional initiation region oriented to 

produce functional mRNA. The native transcriptional initiation region, e.g. SEQ ID 
NO: 14, or an exogenous transcriptional initiation region may be employed. The 
promoter may be introduced by recombinant methods in vitro, or as the result of 
homologous integration of the sequence into a chromosome. Many strong 

30 promoters are known in the art, including the p-actin promoter, SV40 early and late 
promoters, human cytomegalovirus promoter, retroviral LTRs, methallothionein 
responsive element (MRE), tetracycline-inducible promoter constructs, etc. 

-22- 



WO 99/37809 PCT/US98/01 260 

Expression vectors generally hav conveni nt restriction sites located near 
the promoter sequence to provide for the insertion of nucl ic acid sequences. 
Transcription cassettes may be prepared comprising a transcription initiation region, 
the target gene or fragment thereof, and a transcriptional termination region. The 
5 transcription cassettes may be introduced into a variety of vectors, e.g. plasmid; 
retrovirus, e.g. lentivirus; adenovirus; and the like, where the vectors are able to 
transiently or stably be maintained in the cells, usually for a period of at least about 
one day, more usually for a period of at least about several days to several weeks. 
Antisense molecules are used to down-regulate expression of ASTH1 in 

10 cells. The anti-sense reagent may be antisense oligonucleotides (ODN), 

particularly synthetic ODN having chemical modifications from native nucleic acids, 
or nucleic acid constructs that express such anti-sense molecules as RNA. The 
antisense sequence is complementary to the mRNA of the targeted gene, and 
inhibits expression of the targeted gene products. Antisense molecules inhibit gene 

15 expression through various mechanisms, e.g. by reducing the amount of mRNA 
available for translation, through activation of RNAse H, or steric hindrance. One or 
a combination of antisense molecules may be administered, where a combination 
may comprise multiple different sequences. 

Antisense molecules may be produced by expression of all or a part of the 

20 target gene sequence in an appropriate vector, where the transcriptional initiation is 
oriented such that an antisense strand is produced as an RNA molecule. 
Alternatively, the antisense molecule is a synthetic oligonucleotide. Antisense 
oligonucleotides will generally be at least about 7, usually at least about 12, more 
usually at least about 20 nucleotides in length, and not more than about 500, 

25 usually not more than about 50, more usually not more than about 35 nucleotides in 
length, where the length is governed by efficiency of inhibition, specificity, including 
absence of cross-reactivity, and the like. It has been found that short 
oligonucleotides, of from 7 to 8 bases in length, can be strong and selective 
inhibitors of gene expression (see Wagner ef a/. (1996) Nature Biotechnology 

30 14:840-844). 

A specific region or regions of the endogenous sense strand mRNA 
sequence is chosen to be complemented by the antisense sequence. Selection of 
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a specific sequence for the oligonucleotide may use an empirical method, where 
several candidate sequences are assayed for inhibition of expression of the target 
gene in an in vitro or animal model. A combination of sequences may also be used, 
where several regions of the mRNA sequence are selected for antisense 
5 complementation. 

Antisense oligonucleotides may be chemically synthesized by methods 
known in the art (see Wagner et at. (1993) supra, and Milligan et a/., supra.) 
Preferred oligonucleotides are chemically modified from the native phosphodiester 
structure, in order to increase their intracellular stability and binding affinity. A 

10 number of such modifications have been described in the literature, which alter the 
chemistry of the backbone, sugars or heterocyclic bases. 

Among useful changes in the backbone chemistry are phosphorothioates; 
phosphorodithioates, where both of the non-bridging oxygens are substituted with 
sulfur; phosphoroamidites; alkyl phosphotriesters and boranophosphates. Achiral 

15 phosphate derivatives include S'-O'-S'-S-phosphorothioate, 3-S-5-0- 

phosphorothioate, 3-CH2-5'-0-phosphonate and 3'-NH-5'-0-phosphoroamidate. 
Peptide nucleic acids replace the entire ribose phosphodiester backbone with a 
peptide linkage. Sugar modifications are also used to enhance stability and affinity. 
The ct-anomer of deoxyribose may be used, where the base is inverted with respect 

20 to the natural p-anomer. The 2'-OH of the ribose sugar may be altered to form 2- 
O-methyl or 2'-0-allyl sugars, which provides resistance to degradation without 
comprising affinity. Modification of the heterocyclic bases must maintain proper 
base pairing. Some useful substitutions include deoxyuridine for deoxythymidine; 
S-methyl^'-deoxycytidine and 5-bromo-2'-deoxycytidine for deoxycytidine. 5- 

25 propynyl-2-deoxy uridine and 5-propynyl-2'-deoxycytidine have been shown to 
increase affinity and biological activity when substituted for deoxythymidine and 
deoxycytidine, respectively. 

As an alternative to anti-sense inhibitors, catalytic nucleic acid compounds, 
e.g. ribozymes, anti-sense conjugates, etc. may be used to inhibit gene expression. 

30 Ribozymes may be synthesized in vitro and administered to the patient, or may be 
encoded on an expression vector, from which the ribozyme is synthesized in the 
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targeted cell (for example, see International patent application WO 9523225, and 
Beigelman et al. (1995) Nucl. Acids Res 23:4434-42). Examples of oligonucleotides 
with catalytic activity are described in WO 9506764. Conjugates of anti-sense ODN 
with a metal complex, e.g. terpyridylCu(ll), capable of mediating mRNA hydrolysis 
5 are described in Bashkin et al. (1995) Appl Biochem Biotechnol 54:43-56. 

Therapeutic Use ofASTHI Protein 
A host may be treated with intact ASTH1 protein, or an active fragment 
thereof to modulate or reduce bronchial hyperactivity. Desirably, the peptides will 
not induce an immune response, particularly an antibody response. Xenogeneic 

10 analogs may be screened for their ability to provide a therapeutic effect without 
raising an immune response. The protein or peptides may also be administered to 
in vitro cell cultures. 

Various methods for administration may be employed. The polypeptide 
formulation may be given orally, or may be injected intravascularly, subcutaneously, 

15 peritoneally, etc. Methods of administration by inhalation are well-known in the art. 
The dosage of the therapeutic formulation will vary widely, depending upon the 
nature of the disease, the frequency of administration, the manner of administration, 
the clearance of the agent from the host, and the like. The initial dose may be 
larger, followed by smaller maintenance doses. The dose may be administered as 

20 infrequently as weekly or biweekly, or fractionated into smaller doses and 

administered daily, semi-weekly, etc. to maintain an effective dosage level. In many 
cases, oral administration will require a higher dose than if administered 
intravenously. The amide bonds, as well as the amino and carboxy termini, may be 
modified for greater stability on oral administration. 

25 The subject peptides may be prepared as formulations at a 

pharmacologically effective dose in pharmaceutical^ acceptable media, for 
example normal saline, PBS, etc. The additives may include bactericidal agents, 
stabilizers, buffers, or the like. In order to enhance the half-life of the subject 
peptide or subject peptide conjugates, the peptides may be encapsulated, 

30 introduced into the lumen of liposomes, prepared as a colloid, or another 

conventional technique may be employed that provides for an extended lifetime of 
the peptides. 

-25- 



WO 99/37809 PCT/US98/01 260 

The peptides may be administered as a combination therapy with other 
pharmacologically active agents. The additional drugs may be administered 
separately or in conjunction with the peptide compositions, and may be included in 
the same formulation. 
5 Models for Asthma 

The subject nucleic acids can be used to generate genetically modified 
non-human animals or site specific gene modifications in cell lines. The term 
"transgenic" is intended to encompass genetically modified animals having a 
deletion or other knock-out 01ASTH1 gene activity, having an exogenous ASTH1 

10 gene that is stably transmitted in the host cells, or having an exogenous ASTH1 
promoter operably linked to a reporter gene. Transgenic animals may be made 
through homologous recombination, where the ASTH1 locus is altered. 
Alternatively, a nucleic acid construct is randomly integrated into the genome. 
Vectors for stable integration include plasmids, retroviruses and other animal 

15 viruses, YACs, and the like. Of interest are transgenic mammals, e.g. cows, pigs, 
goats, horses, etc., and particularly rodents, e.g. rats, mice, etc. 

A "knock-out" animal is genetically manipulated to substantially reduce, or 
eliminate endogenous ASTH1 function. Different approaches may be used to 
achieve the "knock-out". A chromosomal deletion of all or part of the native ASTH1 

20 homolog may be induced. Deletions of the non-coding regions, particularly the 
promoter region, 3' regulatory sequences, enhancers, or deletions of gene that 
activate expression of ASTH1 genes. A functional knock-out may also be achieved 
by the introduction of an anti-sense construct that blocks expression of the native 
ASTH1 genes (for example, see Li and Cohen (1996) Cell 85:319-329). 

25 Transgenic animals may be made having exogenous ASTH1 genes. The 

exogenous gene is usually either from a different species than the animal host, or is 
otherwise altered in its coding or non-coding sequence. The introduced gene may 
be a wild-type gene, naturally occurring polymorphism, or a genetically manipulated 
sequence, for example those previously described with deletions, substitutions or 

30 insertions in the coding or non-coding regions. The introduced sequence may 

encode an ASTH1 polypeptide, or may utilize the ASTH1 promoter operably linked 
to a reporter gene. Where the introduced gene is a coding sequence, it usually 

-26- 



WO 99/37809 PCT/US98/01 260 

operably linked to a promoter, which may be constitutive or inducible, and other 
regulatory sequences required for expression in the host animal. 

Specific constructs of interest, but are not limited to, include anti-sense 
ASTH1, which will block ASTH1 expression, expression of dominant negative 
5 ASTH1 mutations, and over-expression of a ASTH1 gene. A detectable marker, 
such as lac Z may be introduced into the ASTH1 locus, where upregulation of 
ASTH1 expression will result in an easily detected change in phenotype. 
Constructs utilizing the ASTH1 promoter region, e.g. SEQ ID NO:1; SEQ ID NO:14, 
in combination with a reporter gene or with the coding region of ASTH1J or ASTH1I 

1 0 are also of interest. 

The modified cells or animals are useful in the study of ASTH1 function and 
regulation. Animals may be used in functional studies, drug screening, etc., e.g. to 
determine the effect of a candidate drug on asthma. A series of small deletions 
and/or substitutions may be made in the ASTH1 gene to determine the role of 

15 different exons in DNA binding, transcriptional regulation, etc. By providing 

expression of ASTH1 protein in cells in which it is otherwise not normally produced, 
one can induce changes in cell behavior. These animals are also useful for 
exploring models of inheritance of asthma, e.g. dominant v. recessive; relative 
effects of different alleles and synergistic effects between ASTH1I and ASTH1J and 

20 other asthma genes elsewhere in the genome. 

DNA constructs for homologous recombination will comprise at least a 
portion of the ASTH1 gene with the desired genetic modification, and will include 
regions of homology to the target locus. DNA constructs for random integration 
need not include regions of homology to mediate recombination. Conveniently, 

25 markers for positive and negative selection are included. Methods for generating 
cells having targeted gene modifications through homologous recombination are 
known in the art. For various techniques for transfecting mammalian cells, see 
Keown et al. (1990) Methods in Enzymology 185:527-537. 

For embryonic stem (ES) cells, an ES cell line may be employed, or 

30 embryonic cells may be obtained freshly from a host, e.g. mouse, rat, guinea pig, 
etc. Such cells are grown on an appropriate fibroblast-feeder layer or grown in the 
presence of appropriate growth factors, such as leukemia inhibiting factor (LIF). 
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When ES cells have been transformed, they may be used to produce transgenic 
animals. After transformation, the cells are plated onto a feeder layer in an 
appropriate medium. Cells containing the construct may be detected by employing 
a selective medium. After sufficient time for colonies to grow, they are picked and 
5 analyzed for the occurrence of homologous recombination or integration of the 
construct. Those colonies that are positive may then be used for embryo 
manipulation and blastocyst injection. Blastocysts are obtained from 4 to 6 week 
old superovulated females. The ES cells are trypsinized, and the modified cells are 
injected into the blastocoel of the blastocyst. After injection, the blastocysts are 

10 returned to each uterine horn of pseudopregnant females. Females are then 

allowed to go to term and the resulting litters screened for mutant cells having the 
construct. By providing for a different phenotype of the blastocyst and the ES cells, 
chimeric progeny can be readily detected. 

The chimeric animals are screened for the presence of the modified gene 

15 and males and females having the modification are mated to produce homozygous 
progeny. If the gene alterations cause lethality at some point in development, 
tissues or organs can be maintained as allogeneic or congenic grafts or transplants, 
or in in vitro culture. 

Investigation of genetic function may utilize non-mammalian models, 

20 particularly using those organisms that are biologically and genetically 

well-characterized, such as C. e/egans, D. melanogaster and S. cerevisiae. For 
example, transposon (Tc1) insertions in the nematode homolog of an ASTH1 gene 
or promoter region may be made. The subject gene sequences may be used to 
knock-out or to complement defined genetic lesions in order to determine the 

25 physiological and biochemical pathways involved in ASTH1 function. A number of 

» 

human genes have been shown to complement mutations in lower eukaryotes. 

Drug screening may be performed in combination with the subject animal 
models. Many mammalian genes have homologs in yeast and lower animals. The 
study of such homologs' physiological role and interactions with other proteins can 
30 facilitate understanding of biological function. In addition to model systems based 
on genetic complementation, yeast has been shown to be a powerful tool for 
studying protein-protein interactions through the two hybrid system described in 
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Chien et ai (1991) P.N.A.S. 88:9578-9582. Two-hybrid system analysis is of 
particular interest for exploring transcriptional activation by ASTH1 proteins. 

Drug Screening Assays 
By providing for the production of large amounts of ASTH1 protein, one can 
5 identify ligands or substrates that bind to, modulate or mimic the action of ASTH1. 
Areas of investigation are the development of asthma treatments. Drug screening 
identifies agents that provide a replacement or enhancement for ASTH1 function in 
affected cells. Conversely, agents that reverse or inhibit ASTH1 function may 
stimulate bronchial reactivity. Of particular interest are screening assays for agents 

10 that have a low toxicity for human cells. A wide variety of assays may be used for 
this purpose, including labeled in vitro protein-protein binding assays, protein-DNA 
binding assays, electrophoretic mobility shift assays, immunoassays for protein 
binding, and the like. The purified protein may also be used for determination of 
three-dimensional crystal structure, which can be used for modeling intermolecular 

15 interactions, transcriptional regulation, etc. 

The term "agent" as used herein describes any molecule, e.g. protein or 
pharmaceutical, with the capability of altering or mimicking the physiological . 
function of ASTH1. Generally a plurality of assay mixtures are run in parallel with 
different agent concentrations to obtain a differential response to the various 

20 concentrations. Typically, one of these concentrations serves as a negative control, 
i.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, though typically 
they are organic molecules, preferably small organic compounds having a 
molecular weight of more than 50 and less than about 2,500 daltons. Candidate 

25 agents comprise functional groups necessary for structural interaction with proteins, 
particularly hydrogen bonding, and typically include at least an amine, carbonyl, 
hydroxyl or carboxyl group, preferably at least two of the functional chemical 
groups. The candidate agents often comprise cyclical carbon or heterocyclic 
structures and/or aromatic or polyaromatic structures substituted with one or more 

30 of the above functional groups. Candidate agents are also found among 
biomolecules including, but not limited to: peptides, saccharides, fatty acids, 
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steroids, purines, pyrimidines, derivatives, structural analogs or combinations 
thereof. 

Candidate agents are obtained from a wide variety of sources including 
libraries of synthetic or natural compounds. For example, numerous means are 
5 available for random and directed synthesis of a wide variety of organic Compounds 
and biomolecules, including expression of randomized oligonucleotides and 
oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, 
fungal, plant and animal extracts are available or readily produced. Additionally, 
natural or synthetically produced libraries and compounds are readily modified 

10 through conventional chemical, physical and biochemical means, and may be used 
to produce combinatorial libraries. Known pharmacological agents may be 
subjected to directed or random chemical modifications, such as acylation, 
alkylation, esterification, amidification, etc. to produce structural analogs. 

Where the screening assay is a binding assay, one or more of the molecules 

15 may be joined to a label, where the label can directly or indirectly provide a 
detectable signal. Various labels include radioisotopes, fluoresces, 
chemiluminescers, enzymes, specific binding molecules, particles, e.g. magnetic 
particles, and the like. Specific binding molecules include pairs, such as biotin and 
streptavidin, digoxin and antidigoxin etc. For the specific binding members, the 

20 complementary member would normally be labeled with a molecule that provides 
for detection, in accordance with known procedures. 

A variety of other reagents may be included in the screening assay. These 
include reagents like salts, neutral proteins, e.g. albumin, detergents, etc that are 
used to facilitate optimal protein-protein binding and/or reduce non-specific or 

25 background interactions. Reagents that improve the efficiency of the assay, such 
as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc. may be used. 
The mixture of components are added in any order that provides for the requisite 
binding. Incubations are performed at any suitable temperature, typically between 4 
and 40°C. Incubation periods are selected for optimum activity, but may also be 

30 optimized to facilitate rapid high-throughput screening. Typically between 0.1 and 1 
hours will be sufficient. 
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Other assays of interest detect agents that mimic ASTH1 function. For 
example, candidate agents are added to a cell that lacks functional ASTH1, and 
screened for the ability to reproduce ASTH1 in a functional assay. 

The compounds having the desired pharmacological activity may be 
5 administered in a physiologically acceptable carrier to a host for treatment of 
asthma attributable to a defect in ASTHIfunction. The compounds may also be 
used to enhance ASTH1 function. The therapeutic agents may be administered in 
a variety of ways, orally, topically, parenterally e.g. subcutaneously, 
intraperitoneally, by viral infection, intravascularly, etc. Inhaled treatments are of 

1 0 particular interest. Depending upon the manner of introduction, the compounds 
may be formulated in a variety of ways. The concentration of therapeutically active 
compound in the formulation may vary from about 0.1-100 wt.%. 

The pharmaceutical compositions can be prepared in various forms, such as 
granules, tablets, pills, suppositories, capsules, suspensions, salves, lotions and the 

15 like. Pharmaceutical grade organic or inorganic carriers and/or diluents suitable for 
oral and topical use can be used to make up compositions containing the 
therapeutically-active compounds. Diluents known to the art include aqueous 
media, vegetable and animal oils and fats. Stabilizing agents, wetting and 
emulsifying agents, salts for varying the osmotic pressure or buffers for securing an 

20 adequate pH value, and skin penetration enhancers can be used as auxiliary 
agents. 

Pharmacogenetics 

Pharmacogenetics is the linkage between an individual's genotype and that 
individual's ability to metabolize or react to a therapeutic agent. Differences in 

25 metabolism or target sensitivity can lead to severe toxicity or therapeutic failure by 
altering the relation between bioactive dose and blood concentration of the drug. In 
the past few years, numerous studies have established good relationships between 
polymorphisms in metabolic enzymes or drug targets, and both response and 
toxicity. These relationships can be used to individualize therapeutic dose 

30 administration. 

Genotyping of polymorphic alleles is used to evaluate whether an individual 
will respond well to a particular therapeutic regimen. The polymorphic sequences 
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are also used in drug screening assays, to determine the dose and specificity of a 
candidate therapeutic agent. A candidate ASTH1 polymorphism is screened with a 
target therapy to determine whether there is an influence on the effectiveness in 
treating asthma. Drug screening assays are performed as described above. 
5 Typically two or more different sequence polymorphisms are tested for response to 
a therapy. 

Drugs currently used to treat asthma include beta 2-agonists, 
glucocorticoids, theophylline, cromones, and anticholinergic agents. For acute, 
severe asthma, the inhaled beta 2-agonists are the most effective bronchodilators. 

10 Short-acting forms give rapid relief; long-acting agents provide sustained relief and 
help nocturnal asthma. First-line therapy for chronic asthma is inhaled 
glucocorticoids, the only currently available agents that reduce airway inflammation. 
Theophylline is a bronchodilator that is useful for severe and nocturnal asthma, but 
recent studies suggest that it may also have an immunomodulatory effect. 

1 5 Cromones work best for patients who have mild asthma: they have few adverse 
effects, but their activity is brief, so they must be given frequently. Cysteinil 
leukotrienes are important mediators of asthma, and inhibition of their effects may 
represent a potential breakthrough in the therapy of allergic rhinitis and asthma. 
Where a particular sequence polymorphism correlates with differential drug 

20 effectiveness, diagnostic screening may be performed. Diagnostic methods have 
been described in detail in a preceding section. The presence of a particular 
polymorphism is detected, and used to develop an effective therapeutic strategy for 
the affected individual. 

25 Experimental 

The following examples are put forth so as to provide those of ordinary skill 
in the art with a complete disclosure and description of how to make and use the 
subject invention, and are not intended to limit the scope of what is regarded as the 
invention. Efforts have been made to ensure accuracy with respect to the numbers 

30 used (e.g. amounts, temperature, concentrations, etc.) but some experimental 
errors and deviations should be allowed for. Unless otherwise indicated, parts are 
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parts by weight, molecular weight is average molecular weight; temperature is in 
degrees centigrade; and pressure is at or near atmospheric. 



MATERIALS AND METHODS 
5 Asthma families for genetic mapping studies 

Asthma phenotype measurements and blood samples were obtained from 
the inhabitants of Tristan da Cunha, an isolated island in the South Atlantic, and 
from asthma families in Toronto, Canada (see Zamel ef a/., (1996) supra.) The 282 
inhabitants of Tristan da Cunha form a single large extended family descended from 

10 28 original founders. Settlement of Tristan da Cunha occurred beginning in 1817 
with soldiers who remained behind when a British garrison was withdrawn from the 
island, followed by the survivors of several shipwrecks. In 1827 five women from 
St. Helena, one with children, emigrated to Tristan da Cunha and married island 
men. One of these women is said to have been asthmatic, and could be the origin 

15 of a genetic founder effect for asthma in this population. Inbreeding has resulted in 
kinship resemblances of at least first cousin levels for all individuals. 

The Tristan da Cunha family pedigrees were ascertained through review of 
baptismal, marriage and medical records, as well as reliably accurate historical 
records of the early inhabitants (Zamel (1995) Can. Respir. J. 2:18). The 

20 prevalence of asthma on Tristan da Cunha is high; 23% had a definitive diagnosis 
of asthma. 

The Toronto cohort included 59 small families having at least one affected 
individual. These were ascertained based on the following criteria: (i) an affected 
proband; (ii) availability of at least one sibling of the proband, either affected or 
25 unaffected; (iii) at least one living parent from whom DNA could be obtained. A set 
of 156 "triad" families consisting of an affected proband and his or her parents were 
also collected. Signed consent forms were obtained from each individual prior to 
commencement of phenotyping and blood sample collection. The Toronto patients 
were mainly of mixed European ancestry. 

30 
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Clinical characterization 

A standardized questionnaire based on that of the American Thoracic 
Society (American Lung Association recommended respiratory diseases 
questionnaire for use with adults and children in epidemiology research. 1978. 
5 American Review of Respiratory Disease 118(2):7-53) was used to record the 
presence of respiratory symptoms such as cough, sputum and wheezing; the 
presence of other chest disorders including recent upper respiratory tract infection, 
allergic history; asthmatic attacks including onset, offset, confirmation by a 
physician, prevalence, severity and precipitating factors; other illnesses and 
10 smoking history; and all medications used within the previous 3 months. A 

physician-confirmed asthmatic attack was the principal criterion for a diagnosis of 
asthma. 

Skin atopy was determined by skin prick tests to common allergens: 
A fumigatus, Cladosporium, Alternaria, egg, milk, wheat, tree, dog, grass, horse, 
15 house dust, cat, feathers, house dust mite D. farinae, and house dust mite 

D. pteronyssinus. Atopy testing of Toronto subjects omitted D. pteronyssinus and 
added cockroach and ragweed allergens. Saline and histamine controls were also 

* 

performed (Bencard Laboratories, Mississauga, Ontario). Antihistamines were 
withdrawn for at least 48 hours prior to testing. Wheal diameters were corrected by 

20 subtraction of the saline control wheal diameter, and a corrected wheal size of >3 
mm recorded 10 min after application was considered a positive response. 

Airway responsiveness was assessed by a methacholine challenge test in 
those subjects with a baseline FEV1 (forced exhalation volume in one second) > 
70% of predicted (Crapo et ak (1981) Am. Rev. Respir. Pis . 123:659). 

25 Methacholine challenge response was determined using the tidal breathing method 
(Cockcroft et al. (1977) Clin. Allergy 7:235). Doubling doses of methacholine from 
0.03 to 16 mg/ml were administered using a Wright nebulizer at 4-min intervals to 
measure the provocative concentration of methacholine producing a 20% fall in 
FEV1 (PC20). If FEV1 was <70% of predicted, a bronchodilator response to 400 

30 mg salbutamol aerosol was used to determine airway responsiveness. Both 
methacholine challenges and bronchodilator responses were measured using a 
computerized bronchial challenge system (S&M Instrument Co. Inc., Doyleston, PA) 
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consisting of a software package and interface board installed in a Toshiba T1850C 
laptop computer and connected to a flow sensor (RS232FS). The power source for 
instruments used on Tristan da Cunha has been described (Zamel etal. (1996) 
supra.) Increased airway responsiveness was defined as a PC20 < 4.0 mg/ml or a 
5 > 15% improvement in FEV1 15 min postbronchodilator. Participants were asked to 
withhold bronchodilators at least 8 h before testing; inhaled or systemic steroids 
were maintained at the usual dosage. Subjects with a history of an upper 
respiratory tract infection within a month of testing were rechallenged at a later date. 

1 0 Genotyping 

PCR primer pairs were synthesized using Applied Biosystems 394 
automated oligo synthesizer. The forward primer of each pair was labeled with 
either FAM, HEX, or TET phosphoramidites (Applied Biosystems). No oligo 
purification step was performed. 

15 Genomic DNA was extracted from whole blood. PCR was performed using 

PTC100 thermocyclers (MJ Research). Reactions contained 10 mM Tris-HCI, pH 
8.3; 1.5-3.0 mM MgCI 2 ; 50 mM KCI; 0.01% gelatin; 250 ^M each dGTP, dATP, 
dTTP, dCTP; 20 \xM each PCR primer; 20 ng genomic DNA; and 0.75 U Taq 
Polymerase (Perkin Elmer Cetus) in a final volume of 20 \il Reactions were 

20 performed in 96 well polypropylene microtiter plates (Robbins Scientific) with an 
initial 94°C f 3 min. denaturation followed by 35 cycles of 30 sec. at 94°C, 30 sec. at 
the annealing temp., and 30 sec. at 72°C, with a final 2 min. extension at 72°C 
following the last cycle. Dye label, annealing temperature, and final magnesium 
concentration were specific to the individual marker. 

25 Dye label intensity and quantity of PCR product (as assessed on agarose 

gels) were used to determine the amount to be pooled for each marker locus. The 
pooled products were precipitated and the product pellets mixed with 0.4 jil 
Genescan 500 Tamra size standard, 2 nl formamide, and 1 |il ABI loading dye. 
Plates of PCR product pools were heated to 80°C for 5 minutes and immediately 

30 placed on ice prior to gel loading. 
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PCR products were electrophoresed on denaturing 6%-polyacrylamide gels 
at a constant 1000 volts using ABI 373a instruments. Peak detection, sizing, and 
stutter band filtering were achieved using Genescan 1.2 and Genotyper 1.1 
software (Applied Biosystems). Genotype data were subsequently submitted to 
5 quality control and consistency checks (Hall et ai (1996) Genome Res . 6:781). 

Genotyping of Saturation* markers in the ASTH1 region was done by the 
method described above with several exceptions. In most cases, the unlabeled 
primer of each pair was modified with the sequence GTTTCTT at the 5' end (Smith 
et ai 1995 Genome Res . 5:312). Amplitaq Gold (Perkin Elmer Cetus) and buffer D 

10 (2.5 mM MgCI 2 , 33.5 mM Tris-HCI pH 8.0, 8.3 mM (NH 4 ) 2 S0 4 , 25 mM KCI, 85 pg/ml 
BSA) were used in the PCR. A 'touchdown 1 amplification profile was employed in 
which the annealing temperature began at 66°C and decreased one degree per 
cycle to a final 20 cycles at 56°C. Products were run on 4.25% polyacrylamide gels 
using ABI 377 instruments. The data was processed with Genescan 2.1 and 

1 5 Genotyper 1 . 1 software. 

The Genome Scan 

A genome scan was performed in the population of Tristan da Cunha using 
274 polymorphic microsatellite markers chosen from among those developed at 

20 Oxford (Reed et ai (1994) Nature Genetics 7:390), Genethon (Dib et a/. (1996) 
Nature 380:152) and the Cooperative Human Linkage Center (CHLC, Murray et ai 
(1994) Science 265:2049). Markers with heterozygosity values of 0.75 or greater 
were selected to cover all the human chromosomes, as well as for ease of 
genotyping and size of PCR product for multiplexing of markers on gels. Fifteen 

25 multiplexed sets were used to provide a ladder of PCR products in each of three 
dyes when separated by size. Published distances were used initially to estimate 
map resolution. More accurate genetic distances were calculated using the study 
population as the data was generated. The 274 markers gave an average 14 cM 
interval for the genome scan. 

30 
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Linkage analysis 

Parametric linkage analyses of marker data were conducted using the 
methods of Haseman and Elston (1972) Behav. Genet. 2:3, and FASTLINK 
(Schaffer et a/. (1996) Hum. Hered . 46:226), assuming a dominant mode of 
5 transmission with incomplete penetrance. Linkage to three primary phenotypes 
including asthma diagnosis (history), airway responsiveness (PC20 < 4 mg/ml for 
methacholine challenge) and atopy (one or more skin-prick test which yielded a 
wheal diameter > 3 mm) and combinations of these, were tested. 

1 0 Small scale yeast artificial chromosome (YAC) DNA preparation 

Small scale isolation of YAC DNA for STS mapping was done by a procedure 
which uses glass beads and physical shearing to damage the yeast cell wall 
(Scherer and Tsui (1991) Cloning and analysis of large DNA molecules . In 
Advanced Techniques in Chromosome Research. (K.W. Adolph, ed.) pp. 33-72. 

15 Marcel Dekker, Inc. New York, Basel, Hong Kong.) 

YAC block prep and pulsed field gel electrophoresis (PFGE) 

A 50 ml culture of each YAC was grown in 2 x AHC at 30°C. The cells were 

pelleted by centrifugation and washed twice in sterile water. After resuspension of 
20 the cells in 4 ml of SCEM (1 M sorbitol, 0.1 M sodium citrate (pH 5.8), 10 mM 

EDTA, 30 mM p-mercaptoethanol), 5 ml of 1.2% low melting temperature agarose 

in SCEM was added, mixed, pipetted into 100 ml plug molds and allowed to solidify. 

Plugs were incubated overnight in 50 ml of SCEM containing 30 U/ml lyticase 

(Sigma). Plugs were rinsed 3 times in TE (10 mM Tris pH 8.0, 1 mM EDTA) and 
25 incubated twice for 12 hours each at 50°C in lysis solution (0.5 M EDTA, pH 8.0; 

1% w/v sodium lauryl sarcosine; 0.5 mg/ml proteinase K). They were washed 5 

times with TE and stored in 0.5 M EDTA (pH 8.0) at 4°C. 

YACs and yeast chromosomes were separated on pulsed field gels using a 

CHEF Mapper (BIO-RAD) and according to methods supplied by the manufacturer, 
30 then transferred to nitrocellulose. YACs which comigrated with yeast chromosomes 

were visualized by hybridization of the blot with radiolabeled YAC vector 

sequences (Scherer and Tsui (1991) supra.) 
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Hybridization of VAC DNA to bacterial artificial chromosome (BAC) and cosmid 
grids 

Size-purified YAC DNA was prepared by pulsed field gel electrophoresis on a 
low melting temperature Seaplaque GTG agarose (FMC) gel, purified by 
5 GeneClean (BIO101) and radiolabeled for 30 mins with 32 P-dCTP using the Prime-It 
II kit (Stratagene). 50 \x\ of water was added and unincorporated nucleotide was 
removed by Quick Spin Column (Boehringer Mannheim). 23 ^il of 11.2 mg/ml 
human placental DNA (Sigma) and 36 jil of 0.5 M Na 2 HP0 4 , pH 6.0 were added to 
the approximately 150 |il of eluant. The probe was boiled for 5 mins and incubated 

10 at 65°C for exactly 3 hours, then added to the prehybridized gridded BAC (Shizuya 
et al. (1992) Proc. Natl. Acad. Sci. 89:8794; purchased from Research Genetics) or 
chromosome 11 cosmid [Resource Center/ Primary Database of the German 
Human Genome Project, Berlin; Lehrach etal. (1990), In Davies, K.E. and 
Tilghman, S.M. (eds.). Genome Analysis Volume 1: Genetic and Physical Mapping. 

15 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp. 39-81] filters in 
dextran sulfate hybridization mix (10% dextran sulfate, 1% SDS, 1 M NaCI). 
Hybridizations were at 65°C for 12 - 48 hours, followed by 2 washes at room 
temperature in 2x SSC for 10 mins each, and 3 washes at 65°C in 0.2X SSC, 0.2% 
SDS for 20 mins each. 

20 

Metaphase fluorescence in situ hybridization (FISH) and direct visual in situ 
hybridisation (DIRVISH) 

Metaphase FISH was carried out by standard methods (Heng and Tsui 
(1994) FISH detection on DAPI banded chromosomes. In Methods of Molecular 

25 Bioloav: In Situ Hybridisation Protocols (K.H A Choo, ed.) pp. 35-49. Human Press, 
Clifton, NJ.). High resolution FISH, or DIRVISH, was used to map the relative 
positions of two or more clones on genomic DNA. The protocol used was as 
described by Parra and Windle (1993) Nature Genet . 5:17. Briefly, slides 
containing stretched DNA were prepared by adding 2 |al of a suspension of normal 

30 human lymphoblast cells at one end of a glass slide and allowing to dry. 8 ^l lysis 
buffer (0.5% SDS, 50 mM EDTA, 200 mM Tris-HCL, pH 7.4) was added and the 
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slide incubated at room temperature for 5 minutes. The slide was tilted so that the 
DNA ran down the slide, then dried. The DNA was fixed by adding 400 nl 3:1 
methanol/acetic acid. Probes were labeled either with biotin or with digoxygenin by 
standard nick translation (Rigby et a/. (1977) J. Mol. Biol . 113:237). Hybridization 
5 and detections were carried out using standard fluorescence in situ hybridization 
techniques (Heng and Tsui (1994) supra.). Results were visualised using a 
Mikrophot SA microscope (Nikon) equipped with a CCD camera (Photometries). 
Images were recorded using Smartcapture software (Vysis). 

1 0 Gap filling 

Clones flanking gaps in the map were end cloned by digestion with enzymes 
that do not cut the respective vector sequences (Nsil for BAC clones and Xbal for 
PAC clones), followed by religation and transformation into competent DH5a. 
Clones which produced two end fragments and plasmid vector upon digestion with 
15 Notl and Nsil or Xbal were sequenced. Gaps in the tiling path were filled by 
screening a gridded BAC library with the end clone probes or by screening DNA 
pools of a human genomic PAC library (loannou et al. (1994) Nature Genetics 6:84; 
licensed from Health Research, Inc.) by PCR using primers designed from end 
clone sequences. 

20 

Direct cDNA selection 

Direct cDNA selection (Lovett et al., (1991) Proc. Natl. Acad. Sci. 88:9628) 
was carried out using cDNA derived from both adult whole lung tissue and fetal 
whole lung tissue (Clontech). 5 jag of Poly(A)+ RNA was converted to double 

25 stranded cDNA using the Superscript Choice System for cDNA synthesis and the 
supplied protocol (Gibco BRL). First strand priming was achieved by both oligo(dT) 
and random hexamers. The resulting cDNA was split into 2 equal aliquots and 
digested with either Mbol or Taql prior to the addition of specific linker primers. 
Linker primers for Mbol-digested DNA were as described by Morgan et al. (1992) 

30 Nucleic Acid Res . 20:5173. Linker primers for Taql-digested DNA were a 
modification of these: 
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(SEQ ID NO:336 ) Taqla: S'-CGAGAATTCACTCGAGCATCAGG; 
(SEQ ID NO:337 ) Taqlb: 5'-CCTGATGCTCGAGTGAATTCT. The modified cDNA 
was ethanol precipitated and resuspended in 200 ^il of H 2 0. 1 of cDNA was 
amplified with the linker primer Mbolb in a 100 yl PCR reaction. The resulting 
5 cDNA products, approximately 1 jag, were blocked with 1 jig of COT1 DNA (Gibco 
BRL) for 4 hours at 60°C in 120 mM NaP0 4 buffer, pH 7.0. 

Approximately 1 ^g of the appropriate genomic clones was biotinylated using 
the BioNick Labeling System (Gibco BRL). Unincorporated biotin was removed by 
spin column chromatography. Approximately 100 ng of biotinylated genomic DNA 

10 was denatured and allowed to hybridize to 1 jag of blocked cDNA in a total volume 
of 20 ^il in 120 mM NaP0 4 for 60 hours at 60°C under mineral oil. After 
hybridization, the biotinylated DNA was captured on streptavidin-coated magnetic 
beads (Dynal) in 100 [i\ of binding buffer (1 M NaCI, 10 mM Tris, pH 7.4, 1 mM 
EDTA) for 20 minutes at room temperature with constant rotation. Two 15 minute 

1 5 washes at room temperature with 500 jxl of 1X SSC/0. 1 % SDS were followed by 
four washes for 20 minutes at 65°C with 500 ul of 0.1 X SSC/0. 1% SDS with 
constant rotation. After each wash, the beads were collected on the side of the tube 
using magnet separation and the supernatant was removed with a pipette. 
Following the last wash, the beads were briefly rinsed once with wash solution prior 

20 to eluting the bound cDNA with 50 |al of 0.1 M NaOH for 10 minutes at room 

temperature. The supernatant was removed and neutralized with 50 \i\ 1 M Tris pH 
7.4. The primary selected cDNA was desalted using a Sephadex G-50 column 
(Boehringer Mannheim). PCR was performed on 1, 2, 5, and 10 ul of eluate with 
Mbolb primers. Amplified products were analyzed on a 1.4% agarose gel. The 

25 reaction with the cleanest bands and least background was scaled up to produce 
approximately 1 jag of primary selected cDNA. This amplified primary selected 
cDNA was blocked with 1 jig of COT1 at 60°C for 1 hour followed by a second 
round of hybridization to 100 ng of the appropriate genomic DNA under the same 
conditions as the first round of selection. Washing of the bound cDNA, elution, and 

30 PCR of the selected cDNA was identical to the first round. 1 \i\ of PCR amplified 
secondary selected cDNA was cloned using the TA cloning system according to the 
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manufacturers protocol (Invitrogen). Colonies were picked into 96-well microtiter 
plates and grown overnight prior to sequencing. 



Exon Trapping 

5 Exon trapping was performed by the method of Buckler et a/. (1991, Proc. 

Natl. Acad. Sci. USA 88:4005) with modifications described in Church etal., (1994) 
Nature Genetics 6:98. Each BAC clone of the minimal set of clones required to the 
cover the ASTH1 region (/.e. the tiling path) was subject to exon trapping 
separately. Briefly, restriction fragments (Pstl or BamHI/Bglll) of each cosmid were 

10 shotgun subcloned into Pstl- or BamHI-digested and phosphatase-treated pSPL3B 
which had been modified as in Burns et ai (1995) Gene 161:183 (GIBCO BRL). 
Ligations were electroporated into ElectroMax HB101 cells (Gibco BRL) and plated 
on 20 cm diameter LB ampicillin plates. DNA was prepared from plates with > 2000 
colonies by collection of the bacteria in LB ampicillin liquid and plasmid DNA 

15 purification by a standard alkaline lysis protocol (Sambrook et ai (1989) supra.) 5 
ng of DNA from each plasmid pool preparation were electroporated into Cos 7 cells 
(ATCC) and RNA harvested using TRIZOL (Gibco BRL) after 48 hours of growth. 
RT-PCR products were digested with BstXI prior to a second PCR amplification. 
Products were cloned into pAMP10 (Gibco BRL) and transformed into DH5 cells 

20 (Gibco BRL). 96 colonies per BAC were picked and analyzed for insert size by 
PCR. 

Northern blot hybridisation 

Northern hybridisation was performed using Multiple Tissue Northern (MTN) 
25 blots (Clontech). DNA probes were radioactively labeled by random priming 

[Feinberg and Vogelstein (1984) Anal. Biochem . 137:266] using the Prime-It II kit 
(Stratagene). Hybridizations were performed in ExpressHyb hybridisation solution 
(Clontech) according to the manufacturer's recommendations. Filters were exposed 
to autoradiographic film overnight or for 3 days. 

30 
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cDNA library screening 

Phage cDNA libraries were plated and screened with radiolabeled probes 
(exon trapping or cDNA selection products amplified by PCR from plasmids 
containing these sequences) by standard methods (Sambrook et al. (1989) supra.) 

Rapid amplification ofcDNA ends (RACE) 

RACE libraries were constructed using polyA+ RNA and the Marathon cDNA 
amplification kit (Clontech). Nested RACE primer sets were designed for each 
cDNA or potential gene fragment (trapped exon, predicted exon, conserved 
fragment, etc). The RACE libraries were tested by PCR using one primer pair for 
each potential gene fragment; the two strongly positive libraries were chosen for 
RACE experiments. 

Genomic sequencing 

DNA from cosmid, PAC, and BAC clones was prepared using Qiagen DNA 
prep kits and further purified by CsCI gradient DNA was sonicated and DNA 
fragments were repaired using nuclease BAL-31 and T4 DNA polymerase. DNA 
fragments of 0.8-2.2 kb were size-fractionated by agarose gel electrophoresis and 
ligated into pUC9 vector. Inserts of the plasmid clones were amplified by PCR and 
sequenced using standard ABI dye-primer chemistry. 

ABI sample file data was reanalyzed using Phred (Phil Green, University of 
Washington) for base calling and quality analysis. Sequence assembly of 
reanalyzed sequence data was accomplished using Phrap (Phil Green, University 
of Washington). Physical gaps between assembled contigs and unjoined but 
overlapping contigs were identified by inspection of the assembled data using GFP 
(licensed from Baylor College of Medicine) and Consed (Phil Green, University of 
Washington). Material for sequence data generation across gaps was obtained by 
PCR amplification. Low coverage regions were resequenced using dye-primer and 
dye-terminator chemistries (ABI). Final base-perfect editing (to > 99% accuracy) 
was accomplished using Consed. 
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Single stranded conformational polymorphism (SSCP) analysis - 

PCR primers flanking each exon of the ASTH1I and ASTH1 J genes, or more 
than one primer pair for large exons, were designed from genomic sequence 
generated using Primer (publicly available from the Whitehead Institute for 
5 Biomedical Research) or Oligo 4.0 (licensed from National Biosciences). 

Radioactive SSCP was performed by the method of Orita et ai (1989, Proc. Natl. 
Acad. Sci . 86:2766). Briefly, radioactively labeled PCR products between 150 and 
300 bp and spanning exons of the ASTH1I and ASTH1 J genes were generated 
from a set of asthma patient and control genomic template DNAs, by incorporating 

10 a- 32 P dCTP in the PCR. PCR reactions (20 ^il) included 1x reaction buffer, 100 \M 
dNTPs, 1 jiM each forward and reverse primer, and 1 unit Taq DNA polymerase 
(Perkin-Elmer) and 1 jiCi a- 32 P dCTP. A brief denaturation at 94°C was followed by 
30-32 cycles of: 94°C for 30 sec, 30 sec at the annealling temperature, and 72°C for 
30 sec; followed by 5 mins at 72°. Radiolabeled PCR products were diluted 1:20 in 

15 water, mixed with an equal volume of denaturing loading dye (95% formamide, 

0.25% bromophenol blue), and denatured for 10 minutes at 80°C immediately prior 
to electrophoresis. 0.5x MDE (FMC) gels with and without 8% glycerol in 1x TBE 
were run at 8-12 Watts for 16-20 hours at room temperature. Dried gels were 
exposed to autoradiographic film (Kodak XAR) for 1-2 days at -80°C. PCR products 

20 from individuals carrying SSCP variants were subcloned into the PCR2.1 or 

pZeroBlunt plasmid vector (Invitrogen). Inserts of the plasmid clones were amplified 
by PCR and sequenced using standard ABI dye-primer chemistry to determine the 
nature of the sequence variant responsible for the conformational changes detected 
by SSCP. 

25 Fluorescent SSCP was carried out according to the recommended ABI 

protocol (ABI User Bulletin entitled 'Multi Color Fluorescent SSCP 1 ). Unlabeled 
PCR primers were used to amplify genomic DNA segments containing different 
exons of the ASTH1 1 or ASTH1J genes, in patient or control DNA. Nested 
fluorescently labeled (TET, FAM or HEX) primers were then used to amplify smaller 

30 products, 150 to 300 bp containing the exon or region of interest. Amplification was 
done using a 'touchdown' PCR protocol, in which the annealing temperature 
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decreased from 57°C to 42 °C, and Amplitaq Gold polymerase (Perkin Elmer, 
Cetus). In most cases the fluorescently labeled primers were identical in sequence 
to those used for conventional radioactive SSCP. The fluorescent PCR products 
were diluted and mixed with denaturing agents, GeneScan size standard 
(Genescan 500 labelled with Tamra) and Blue dextran dye. Samples were heated 
at 90°C and quick chilled on ice prior to loading on 6.5% standard or 0.5 X MDE 
(manufacturer) polyacrylamide gels containing 2.5% glycerol and run using 
externally temperature controlled modified ABI 377 instruments. Gels were run at 
1240V and 20°C for 7-9 hrs and analyzed using GeneScan software (ABI). 



Comparative (heterozygote detection) sequencing 

Unlabeled PCR primers were used to amplify genomic DNA segments 
containing different exons of the ASTH1I or ASTH1J genes, from patient or control 
DNAs. A set of nested PCR primers was then used to reamplify the fragment. 
15 Unincorporated primers were removed from the PCR product by Centricon-100 
column (Amicon), or by Centricon-30 column for products less than 130 bp. The 
nested primers and dye terminator sequencing chemistry (ABI PRISM dye 
terminator cycle sequencing ready reaction kit) were then used to cycle sequence 
the exon and flanking region. Volumes were scaled down to 5 ^il and 10% DMSO 
20 added to increase peak height uniformity. Sequences were compared between 
samples and heterozygous positions detected by visual inspection of 
chromatograms and using Sequence Navigator (licensed from ABI). 

For some exons, PCR products were also compared by subcloning and 
sequencing, and comparison of sequences for ten or more clones. 

25 

RESULTS 

Genome scanning and linkage analysis 

A genome scan was performed using polymorphic microsatellite markers 
from throughout the human genome, and DNA isolated from blood samples drawn 
30 from the inhabitants of Tristan da Cunha. Linkage analysis, an established 

statistical method used to map the locations of genes and markers relative to other 
markers, was applied to verify the marker orders and relative distances between 
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markers on all human chromosomes, in the Tristan da Cunha population. Linkage 
analysis can detect cosegregation of a marker with disease, and was used as a 
means to detect genes influencing the development of asthma in this population. 
The most highly significant linkage in the genome scan (p = 0.0001 for history of 
5 asthma and p = 0.0009 for methacholine challenge) was obtained at D1 1S907, a 
marker on the short arm of chromosome 1 1 . This significant linkage result indicated 
that a gene influencing predisposition to asthma in the Tristan da Cunha population 
was located near D1 1S907. 

Replication of this finding was obtained in a collection of asthma families 

10 from Toronto, in which D1 1S907 and several nearby markers were tested for 

linkage. The significant linkage seen (p = 0.001 for history of asthma and p = 0.05 
for methacholine challenge) supported the mapping of an asthma gene near 
D11S907 and indicated that the gene was likely to be relevant in the more diverse 
outbred Toronto group as well as in the inbred population of Tristan da Cunha. 

15 The approximate genetic location of the ASTH1 gene in the Tristan da 

Cunha population was confirmed by genotyping and analyzing data from several 
markers near D11S907, spaced at intervals no greater than 5 cM across a possible 
linked region of about 30 cM. Sib-pair and affected pedigree member linkage 
analyses of these markers yielded confirmatory evidence for linkage and refined the 

20 genetic interval. 

Physical mapping at ASTH1: YAC contig construction 

Yeast artificial chromosome (YAC) clones were derived from the CEPH 
megaYAC library (Cohen et ai 1993 Nature 366:698). Individual YAC addresses 

25 were obtained from a public physical map of CEPH megaYAC STS (sequence 

tagged site; Olson et a/. (1989) Science 245:1434) mapping data maintained by the 
Whitehead Institute and accessible through the world wide web (Cohen et ai 1993. 
supra.; http://www-genome.wi.mit.edu/cgi-bin/contig/phys_map). YAC clones 
spanning or overlapping other YACs containing D1 1S907 were chosen for map 

30 construction; STSs mapping to these YACs were used for map and clone 

verification. Some YACs annotated in the public database as being chimeric were 
excluded from the analyses. Multiple colonies of each YAC, obtained from a freshly 
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streaked plate inoculated from the CEPH megaYAC library masterplate, were 
scored using STS markers from the ASTH1 region. These markers included 
polymorphic microsatellite repeats, expressed sequence tags (ESTs) and STSs. 
Comparison of STS mapping data for each clone with the public map allowed 
5 choice of the individual clone which retained the greatest number of ASTH1 region 
STSs, and was therefore least likely to be deleted. YAC addresses for which 
clones differed in STS content were interpreted to be prone to deletion; those for 
which a subset of clones contained no ASTH1 region STSs were presumed to be 
contaminated with yeast cells containing a YAC from another region of the genome. 

10 Chimerism of the chosen clones was assessed by metaphase fluorescent in situ 
hybridization (FISH). Their sizes were determined by pulsed field gel 
electrophoresis (PFGE), Southern blotting and hybridization with a YAC vector 
probe. The PFGE analyses also showed that no YAC clone chosen contained 
more than one yeast artificial chromosome. 

15 An STS map based on assuming the least number of deletions in the YAC 

clones was generated. The STS marker order was in agreement with that of the 
Whitehead map. The STS retention pattern of individual YACs, however, was 
slightly different from that of the public data. In general, the chosen clones were 
positive for a greater number ASTH1 region markers, showing that the data set was 

20 likely to have fewer false negatives than the public map. Non-chimeric YAC clones 

b 

spanning the region of greatest interest were chosen for use as hybridization 
probes for the identification of smaller BAG, PAC, P1 or cosmid clones from the 
region. 

25 Conversion to a plasmid-based clone map 

The YAC map at ASTH1 provided continuous coverage of a 4 Mb region, the 
central 1 Mb of which was of greatest interest. YAC clones comprising a minimal 
tiling path of this region were chosen, and the size purified artificial chromosomes 
were used as hybridization probes to identify BAG and cosmid clones. Gridded 

30 filters of a 3x human genomic BAC library and of a human chromosome 1 1 -specific 
cosmid library were hybridized with radiolabeled purified YAC. Clones 
corresponding to the grid coordinates of the positives were streaked to colony 
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purity, and filters gridded with four clones of each BAC or cosmid. These 
secondary filters were hybridized with size-purified YAC DNAs. A proportion of both 
the BACs and cosmids were found to be non-clonal by these analyses. A positively 
hybridizing clone of each was chosen for further analysis. 
5 The BAC and cosmid clones were STS mapped to establish overlaps 

between the clones. The BACs were further localized by DIRVISH. BACs which 
did not contain an STS marker were mapped in pairwise fashion by simultaneous 
two-color DIRVISH with another BAC. The map produced had three gaps which 
were subsequently filled by end cloning and hybridization of the end clones to a 
10 human genomic PAC library. Genetic refinement of the ASTH1 region had 

occurred concurrently with mapping, rendering it unnecessary to extend the BAC- 
contigged region. Mapping data was recorded in ACeDB (Eeckman and Durbin 
(1 995) Methods Cell Biol. 48:583). 

1 5 Genomic sequencing and gene prediction 

A minimal tiling path of BAC and cosmid clones was chosen for genomic 
sequencing. Over 1 Mb of genomic sequence was generated at ASTH1. On 
average, sequencing was done to 12x coverage (12 times redundancy in 
sequences). Marker order was verified relative to the STS map. 

20 BLAST searches (Altschul et ai (1990) supra.) were performed to identify 

sequences in public databases that were related to those in the ASTH1 region. 
Sequence-based gene prediction was done with the GRAIL [Roberts (1991) 
Science 254:805] and Geneparser [Snyder and Stormo (1993) Nucleic Acids Res . 
21: 607] programs. Genomic sequence and feature data was stored in ACeBD. 

25 

Development of new microsatellite markers for genetic refinement of the ASTH1 
region 

Additional informative polymorphic markers were important for the genetic 
refinement of the ASTH1 region. 'Saturation' cloning of every microsatellite in the 
30 1 Mb region surrounding D1 1S907 was performed. Plasmid libraries were 

constructed from PFGE purified DNA from each YAC, prescreened with a primer 
from each known microsatellite marker, then screened with radiolabeled (CA)15 or 

-47- 



WO 99/37809 PCT/US98/01260 

a pool of trinucleotide and tetra nucleotide repeat oligonucleotides. The plasmid 
inserts were sequenced, the set of sequences compared with those of the known 
microsatellite markers in the region, using Power assembl r (ABI) or Sequencher 
(Alsbyte). Primer pairs flanking each novel microsatellite repeat were designed, 
and the heterozygosity of each new marker was tested by Batched Analysis of 
Genotypes (BAGs; LeDuc et a/., 1995, PCR Methods and Applications 4:331). 
Additional microsatellites were found by analysis of the genomic sequence in 
AceDB. Table 1 lists all the microsatellite markers used for genotyping in the 
ASTH1 region and their repeat type, source and primers. Table 1B lists some 
repeat sequences. 

TABLE 1 

Polymorphic microsatellite markers in the ASTH1 region 



SEQ ID MARKER 



160. 

161. 

162. 

163. 

164. 

165. 

166. 

167. 

168. 

169. 

170. 

171. 

172. 

173. 

174. 

175. 

176. 

177. 

178. 

179. 



110O5GT1 



139C7GT1 



171L24AT1 



253E6GT1 



253E6TE1 



253E6TR1 



65P14 



65P14GT1 



65P14TE1 



65P14TE2 



PRIMER 1 



CTGCTGTGGACGAATAGG 

TCAATATAAT CTTG CTT AACTTGG 

GACCTGTTTGGGTTGATTTCAG 

GTTTCTTACAGTGTCTTGCTATCACATCACC 

GAGGACTGGCAGTACCAAGTAAAC 

GTTTCTTTGGTTCATTCTAAGATGGCTGG 

GCTGAGGCAGGAGAAAAGACAAG 

GTTTCTTCATGCAAAGGTCAGGAGGTAGG 

GTTGCTTCCAGACGAGGTACATG 

GTTTCTTCAATGGCTCCACAAACATCTCTG 

AGGTTTAGGGGACAGGGTTTGG 

GTTTCTTTCCTGGCTAACACGGTGAAATC 

GTTTCTTATTGCCTCCTCCCAAAATTC 

AGAGGCCACTGGAAGACGAA 

AACTGGAGTCAGGCAAAACGTG 

GTTTCTTTGG CTGGT AAGGAAAGAAAC C AC 

GGCTAGGTTCATAAACTCTGTGCTG 

GTTTCTTGATTGTTTGAGATCCTTGACCCAG 

GCCGAAATCACAACACTGCATC 

GTTTCTTGATTCTGCTCTTACTCTTGCCCC 
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180. 
181. 
182. 
183 . 
184. 
185. 
186. 
187. 
188. 
189. 
190. 
191. 
192. 
193 . 
194. 
195. 
196. 
197. 
198. 
199. 
200. 
201. 

202 . 

203 . 
204. 
205. 
206 . 
207. 
208. 
209. 
210. 
211. 
212. 



65P14TR1 



774F 



774J 



774L 



774N 



7740 



774T 



86J5AT2 



86J5CA1 



86J5GT1 



86J5GT2 



86J5TE1 



8E.PENTA1 



8EP04D05 



8016GT1 



8016GT2 



AFM198YB10(G) 



GTAATAGAACCAAAGGGCTGAGAC 

GTTTCTTCGGAGTCAGACCTTACATTGTTGAG 

ATCTCCCTGCTACCCACCTT 

GTTTCTTGTTTTCAGTGAGTTTCTGTTGGG 

GTGTGCCAAACAACATTTGC 

GTTTCTTCAAGCCATCAAGCTAGAGTGG 

GGGCTTTTAAACCCTTATTTAACC 

GTTTCTTAGGTGATCTCAGAGCCACTCA 

AGGGCAGGTGGGAACTTACT 

GTTTCTTTGGAGTCAGTTGAGCTTTCTACC 

TGAACTTGCCTACCTCCCAG 

GTTTCTTAGCATATATCCTTACACAAGCACA 



CATGGTTCCAAAGGCAAGTT 



GTTTCTTTTGAGGCTGAATGAGCTGTG 



ACAGGTGGGAAGACTGAATGTC 

GTTTCTTGCAGTACACATCACATGACCTTG 

GAAATAGGCGGAAACTGGTTC 

GTTTCTT CGTTGTGGTTGTT CAG AAAGG 

GGTCAAGTGTTCAGAACGCATC 

GTTTCTTGCAGGGATTATGCTAGGTCTGTAG 

AGCACTTCTGAGGAAGGGACAC 

GTTTCTTAGGGCAGGCAGACATACAAAC 

G CCAATGTGTT CCTAG AG CGAC 

GTTTCTTTTAAAGGGGGTAGGGTGTCACC 

GGAAGGGAAAAGGACAAGGTTTTG 

GTTTCTTAGCAAGAGCACTGGTGTAGGAGTC 

GCTTTTCAAGCACTTGTCTC 

TGGGATTGTGACTTACCATG 

ACTTGGTGTCTTATAGAAAGGTG 

GTTTCTTAGCTGTGTTTGCTGCATC 

AGATGTGTGATGAGATGCAG 

GTTTCTTCAAATAGTGCAACAAACCC 

TGTCATTCTGAAAGTGCTTCC 
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213 . 
214. 
215. 
216. 
217. 
218. 
219. 
220. 
221. 
222. 

223 . 

224 . 
225. 
226. 
227. 
228. 
229. 
230. 
231. 
232. 
233. 
234 . 
235. 
236 . 
237. 
238. 
239. 
240. 
241. 
242 . 

243. 
244 . 



AFM205YG5 (G) 



AFM206XB2 (G) 



AFM283WH9 (G) 



AFM324YH5 (G) 



AFMA154ZDKG) 



ASMI14 



ASMI14T 



CA11 11 



CA39 2 



CDS 9 (L) 



D11S130KU) 



D11S175KG) 



D11S1776 (G) 



D11S1900(U) 



D11S2008/D11S1392 
(C) 



D11S2014 (C) 



GTTTCTTCTGTAACTAACGATCTGTAGTGGTG 
TATCAAGGTAATATAGTAGCCACGG 



AGGTCTTTC ATG CAGAGTGG 



ATTGCCAAAACTTGGAAGC 



AGGTGACATATCAAGACCCTG 

TTGTCAACGAAGCCCAC 

GTTTCTTGCAAGATTGTGTGTATGGATG 

GCTCTCTATGTGTTTGGGTG 

AAGAGTACG CT AGTGGATGG 

TCCATTAGACCCAGAAAGG 

GTTTCTTCACCAGGCTGAGATGTTACT 

AATCGTTCCTTATCAGGTAATTTGG 



GTTTCTTCAAAGAAAGCAATTCCATCATAACA 



GCATTTGTTGAAGCAAGCGG 



CTTTGTTCCTTGGCTGATGG 



AATAGTACCAGACACACGTG 



CAATGGTTCACAGCCCTTTT 



AGCCTGGGAGACAGAGTGAG 
GTTTCTTGCACTTTTTGGGGAAGGTG 



GTTCCTCCCTTCCCTCTCC 



GTTTCTTTCAGGGACTGGATTGTAG 
GTGTTCTTTATGTGTAGTTC 



GTTTCTTGGCAACAGAGTGAGACTCA 



GTGACATCCAGTGTTGGGAG 



GTTTCTTCCTAAGCAAGCAAGCAATCA 
AAAGGCAATTGGTGGACA 
GTTTCTTTTCAATCCTTGATGCAAAGT 
GGTGACAGAGCAAGATTTCG 



GTTTCTTGT AG AGTTG AGGGAG CAG C 



CATCCATCTCATCCCATCAT 



GTTTCTTTTCACCCTACTGCCAACTTC 



CCGCCATTTTAGAGAGCATA 
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245. 
246. 
247. 
248 . 
249. 
250. 
251. 
252. 

253 . 

254 . 
255. 
256. 

257 . 

258 . 
259. 
260. 
261. 

262 . 

263 . 

264 . 
265. 
266. 
267. 
268. 
269. 
270. 
271. 
272. 

273 . 

274 . 
275. 
276. 



D11S4200 (G) 



D11S907 (G) 



D11S935(G) 



GATA-P18492 (C) 



GATA-P6915(C) 



L19CA3 



L19PENTA1 



L19TETRA5 



LMP2 



LMP3 



LMP4 



LMP7 



T18 5 



T29 9 



774L 



774N 



GTTTCTTTTCTGGGACAATTGGTAGGA 

TTTGTGTTATTATTTCAGGTGC 

GTTTCTTGTTTTTTGTTTCA GTTTAGGAAC 

CATACCCAAATCGTTCTCTTCCTC 

GTTTCTTGGAAAAGCAAAG GCAT CGTAGAG 

T ACT AAC CAAAAGAGTTGGGG 

CTATCATTCAGAAAATGTTGGC 

GTATGGCAGTAGAGGGCATG 

AAGGTTACATTTCAAGAAATAAAGT 

CTGTT C AGGC CTCAATATATACC 

AAGAGGATAGGTGGGGTTTG 

CCTCCCACCTAGACACAAT 

ATATGATCTTTGCATCCCTG 

AAG AAAG AC CTGG AAGG AAT 

AAACAGCAAAACCTCATCTC 

CCACCACTTATTACCTGCAT 

TGAATGAATGAATGAACGAA 

AACTGTGATTGTGCCACTGCACTC 

GTTTCTTCACCGCCTTTATCCCTCAAATG 

GATGGGTGGAGGGCAGTTAAAG 

GTCAAGCAACTTGTCCAAGGCTAC 

CAGGCTATCAGTTTCCTTTGGAG 

GGCAGGTAATACTGGAGAATTAGG 

GACGGATCTCAGAGCCACTC 

GTTTCTTAAAAGATAAGGGCTTTTAAACC 

AGTTTCACAGCTTGTTATGG 

GGTTGATGAAGTGAGACTTT 

ATGGTGGATGCATCCTGTG 

GTTTCTTGTATTGACTCCTCCTCTGC 

CAGTAAACAT 

TGTTGAGTGG 

TCTCCTCAATGTGCATGT 
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277. 



ATTCTACATA 



10 



15 



20 



25 



30 



278. 



279. 
280. 



ASMI14 



CA11 11 



GTGTTTGCAT 



ACAAGTTGGC 
TAGTACCAGA 



281. TACATCCAAGAAAA 

The source of marker was Sequana Therapeutics, Inc. unless a 
letter in parenthesis is indicated after the name, where G = 
Genethon; L = Nothen and Dewald (1995) Clin. Genet. 47:165; U = 
the Utah genome center, see: The Utah Marker Development Group 
(1995) Am. J. Hum. Genet . 57:619; c= the cooperative Human 
Lineage Center. 

Table IB 
Repeat and flanking sequence 

GAGACTCTGA (CA) nAATATATATA 

TGTTGATCGC (CA) nAACCAAAATC 

AATGCATGTA (TG) 2 TATA (TG) nGTGTGGTATG (TG) 3TACATATG 
CG 

285. 7740 CCTCCCAGAA ( CA) n ATCATGATAA 

286. LI 9 PENT AGACAGTCTCAAAAAAT (ATTTT) nAAAGAAAAAGCTGGATAAAT 



SEQ 
282 
283 
284 



Marker 
CA3 9_2 
774F 
774J 



287 



288 
289 
290 

291 
292 



7740 

LI 9 PENT 
Al 

65P14TE 
1 

65P14 

774L 

774N 

ASMI14 
CA11 11 



AACTAG CTTTAAG AAAATAAG AAGAAAAAG AAAGAAG (AAAG) 2TAA 
G (AAAG) nAGAAAGAAAAG (AAAG) nAAAAG (AAAG) nAGGAATGAT 
TGAC 

CGCGCACATA ( CA) nCCCTTTCTCT 
CAGTAAACAT ( CA) n TGTTGAGTGG 

TCTCCTCAATGTGCATGT (GTGC) 2 ATGA (GTGC)2 (AC)n 
ATTCTACATA 

GTGTTTGCAT (GT)n T (GT) 3 ACAAGTTGGC 

TAGTACCAGA (CA)2 CG(TG)2 (CA)2 GGCAAGCG (CA)n C 
(CA)3 TACATCCAAGAAAA 



Genetic refinement of the ASTH1 region 

The microsatellite markers isolated from YACs from the ASTH1 region were 
genotyped in both the Tristan da Cunha and Toronto cohorts. Genetic refinement 
of the ASTH1 region was accomplished by applying the transmission/disequilibrium 
test (TDT; Spielman ef al. (1993) Am. J. Hum. Genet. 52:506) to genetic data from 
the Tristan and Toronto populations, at markers throughout the ASTH1 region. The 
TDT statistic reflects the level of association between a marker allele and disease 
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status. A multipoint version of the TDT test controls for variability in 
heterozygosities between loci, and results in a smoother regional TDT curve than 
would a plot of single locus TDT data. Significance of a TDT value is determined by 
means of the x 2 test; A x 2 value of 3.84 or greater is considered statistically 
5 significant at a probability level of 0.05. Figure 1 shows graphs of x 2 values for key 
ASTH1 region markers for both history of asthma with positive methacholine 
challenge, for the Toronto triad families, x 2 is plotted vs. genomic location of the 
marker on the physical map. 

The Toronto TDT peak is located at marker D1 1S2008 (x 2 = 11.6, p < .0001). 

10 The marker allele in disequilibrium is fairly rare (freq = 6%), representing the fourth 
most common allele at this marker. The relative risk of affection vs. normal for this 
allele is 5.25. This is also the peak marker for linkage and linkage disequilibrium in 
Tristan da Cunha, indicating that the ASTH1 gene is very close to this marker. The 
markers defining the limits of linkage disequilibrium were D11S907 and 65P14TE1. 

15 The physical size of the refined region is approximately 100 kb. 

A significant TDT test reflects the tendency of alleles of markers located near 
a disease locus (also said to be in "linkage disequilibrium" with the disease) to 
segregate with the disease locus, while alleles of markers located further from the 
disease locus segregate independently of affection status. An expectation that 

20 derives from this is that a population for which a disease gene (ie a disease 
predisposing polymorphism) was recently introduced would show statistically 
significant TDT over a larger region surrounding the gene than would a population 
in which the mutant gene had been segregating for a greater length of time. In the 
latter case, time would have allowed more opportunity for markers in the vicinity of 

25 the disease gene to recombine with it. This expectation is fulfilled in our 

populations. The Tristan da Cunha population, founded only 10 generations ago, 
shows a broader TDT curve than does the set of Toronto families, which are mixed 
European in derivation and thus represent an older and more diverse, less recently 
established population. 

30 
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Gene isolation and characterization 

The tiling path of BACs, cosmids and PAC clones was subjected to exon 
trapping and cDNA selection to isolate sequences derived from ASTH1 region 
genes. Exon trap clones were isolated on the basis of size and ability to cross- 
5 hybridize. Approximately 300 putatively non-identical clones were sequenced. 
cDNA selection was performed with adult and fetal lung RNA using pools of tiling 
path clones. The cDNA selection clones were sequenced and the sequences 
assembled with those of the exon trap clones. Representative exon trapping 
clones spanning each assembly were chosen, and arranged as "masterplates" (96- 

10 well microtitre dishes) of clones. Exon trap masterplate clones and cDNA selection 
clones were subjected to expression studies. 

Human multi-tissue Northern blots were probed with PCR products of 
masterplate clones. In some cases, exon trapping clones did not detect RNA 
species, either because they did not represent expressed sequences, or 

15 represented genes with very restricted patterns of expression, or due to small size 
of the exon probe. 

Masterplate clones detecting discrete RNA species on Northern blots were 
used to screen lambda phage based cDNA libraries chosen on the basis of the 
expression pattern of the clone. The sequences of the cDNAs were determined by 

20 end sequencing and sequence walking. cDNAs were also isolated, or extended, by 
5' and 3' rapid amplification of cDNA ends (RACE). In most cases, 5' RACE was 
necessary to obtain the 5' end of the cDNA. 

ASTH1I and ASTH1J were detected by exon trapping. ASTH1I exons 
detected a 2.8 kb mRNA expressed at high levels in trachea and prostate, and at 

25 lower levels in lung and kidney. ASTH1I exons were used as probes to screen 

prostate, lung and testis cDNA libraries; positive clones were obtained from each of 
these libraries. Isolation of a ASTH1 1 cDNA clone from testis demonstrates that this 
gene is expressed in this tissue, and possibly others, at a level not detectable by 
Northern blot analysis. 

30 ASTH1 J exons detected a 6.0 kb mRNA expressed at high levels in the 

trachea, prostate and pancreas and at lower levels in colon, small intestine, lung 
and stomach. Pancreas and prostate libraries were screened with exon clones 
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from ASTH1J. cDNA clone end sequences were assembled using Sequencher 
(Alsbyte) with the sequences of the exon trapped clones, producing sequence 
contigs used to design sequence walking and RACE primers. The additional 
sequences produced by these methods were assembled with the original 
5 sequences to produce longer contigs of cDNA sequences. It was evident from the 
sequence assemblies that both ASTH1I and ASTH1J are alternatively spliced 
and/or have alternative transcription start sites at their 5' ends, since not all clones 
of either gene contained the same 5' sequence. 

ASTH1J has three splice forms consisting of the altl form, found in prostate 

10 and lung cDNA clones, and in which the exons (illustrated in Figure 1) are found in 
the order: 5' a, b, c, d, e, f, g, h, i 3\ A second form, alt2, in which the exon order is: 
5' a2, b, c, d, e, f, g, h, i 3' was seen in a pancreas cDNA clone. A third form, alt3, 
contains an alternate exon, a3, between exons a2 and b. The start codon is within 
exon b, so that the open reading frame is identical for the three forms, which differ 

15 only in the 5' UTR. The ASTH1J cDNAs shown as SEQ ID NO:2 (form altl); SEQ 
ID NO:3 (form alt2); SEQ ID NO:4 (form alt3) are 5427, 5510 and 5667 bp in length, 
respectively. The sequence of the entire protein coding region and alternate 5' 
UTRs are provided. The 3' terminus, where the polyA tail is added, varies by 7 bp 
between clones. The provided sequences are the longest of these variants. The 

20 encoded protein product is provided as SEQ ID NO:5. 

ASTH1I was seen in three isoforms denoted as altl, alt2, and alt3. The 
exons of ASTH1I and ASTH1J were given letter designations before the 
directionality of the cDNA was known, the order is different for the two genes. In 
the altl form of ASTH1I, exons are in the following order: 5' i, f, e, d, c, b, a 3'. In 

25 the alt2 form of ASTH1 1, an alternative 5' exon, j, substitutes for exon i, with the 
following exon arrangement: 5' j, f, e, d, c, b, a 3'. The alt3 form of the gene has 
the exon order: 5' f, k, h, g, e, d, c, b, a 3'. The alternative splicing and start 
codons in each of exons i, f and e give the three forms of ASTH1 1 protein different 
amino termini. The common stop codon is located in exon a, which also contains a 

30 long 3* UTR. Two polyadenylation signals are present in the 3' UTR; some cDNA 
clones end with a polyA tract just after the first polyA signal and for others the polyA 
tract is at the end of the sequence shown. Since the sequences shown for the altl , 
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alt2, and alt3 forms of ASTH1I (2428 bp; 2280 bp and 2498 bp; respectively) are 
close to the estimated Northern blot transcript size of 2.8 kb, these sequences are 
essentially full length. 



5 EST matches 

The nucleotide sequences of the altl, alt2 and alt3 forms of ASTH1J and the 
altl, alt2 and alt3 forms of ASTH1I were used in BLAST searches against dbEST in 
order to identify EST sequences representing these genes. Perfect or near perfect 
matches were taken to represent sequence identity rather than relatedness. 

10 Accession numbers T65960, T64537, AA055924 and AA055327 represent the 
forward and reverse sequences of two clones which together span the last 546 bp 
(excluding the polyA tail) of the 3' UTR of ASTH1I. No ESTs spanned any part of 
the coding region of this gene. One colon cDNA clone (accession number 
AA149006) spanned 402 bp including the last 21 bp of the ASTH1J coding region 

15 and part of the 3' UTR. 

Intron/exon structure determination 

The genomic organization of genes in the ASTH1 region was determined by 
comparison by BLAST of cDNA sequences to the genomic sequence of the region. 
20 The genomic sequence of the ASHT1 region 5' to and overlapping ASTH1J, is 

provided in SEQ ID NO:1. Genomic structure of the ASTH1I and ASTH1J genes is 
shown in Figure 1 ; the intron/exon junction sequences are in Table 2. 

TABLE 2: Genomic organization of the ASTH1I and ASTH1 J genes. 
25 *Exonic sequences are upper case, flanking sequences lower case. 

SEQ NO Exon Size of Sequences at the ends of and 

exon flanking the exons of ASTH1I and 

(bp) ASTH1J* 

ASTH1I 

293. i >214 ggaggctgagCAGGGGTGCC. . . 

294 . . . .ACTCCCACAGgtacctgcag 
30 295. j >66 . . .CTGCCCTCACgtaagcgcct 
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tccttcacagGCCAGTGCAG. . . 
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^ ■ 

325- . . . GAACAAACTGgtg agtagta 

326. e 69 ttttttgtagAGCCTTCCAT. . . 

327. . . . AGCACAGTAGgtaactaact 

328. f 69 atggccacagATTTGTTGGA. . . 
5 329. . . . CTTCCTGTTGgtaagctgtc 

330. g 63 ttctccttagCAGAGTCACC. . . 

331. . . . AAAAAGCACAgtaagttggc 

332. h 196 ttttcatcagACCCGAGAGG. . . 

333. . . .GAGCTATGAGgtgaggagtt 
10 334. i 4457 tttgttacagATATTACTAC . . . 

335. . . . AGCCTGGAAAtgcgtgtttc 

The deduced ASTH1I and ASTH1J proteins 

The protein encoded by ASTH1J (SEQ ID NO:5) is 300 amino acids in 

15 length. A BLASTP search of the protein sequence against the public nonredundant 
sequence database (NCBI) revealed similarity to one protein domain of transcription 
factors of the ets family. The ets family, named for the E26 oncoprotein which 
originally defined this type of transcription factor, is a group of transcription factors 
which activate genes involved in a variety of immunological and other processes, or 

20 implicated in cancer. The family members most similar to ASTH1 1 and ASTH1 J are: 
ETS1, ESX, ETS2, ELF, ELK1, TEL, NET, SAP-1, NERF and FLI. Secondary 
structure analysis and comparison of the protein sequence to the crystal structure of 
the human ETS1-DNA complex (Werner et a/. (1995) Cell 83:761) confirmed that it 
has a winged helix turn helix motif characteristic of some DNA binding proteins 

25 which are transcription factors. 

Multiple sequence alignment of ASTH1I, ASTH1J, and other ETS-domain 
proteins detected a second, N-terminal domain shared by ASTH1I, ASTH1J and 
some, but not all, ETS-domain proteins. Conservation of this motif have been 
observed (Tei et ai (1992) Proc. Natl. Acad. Sci. USA 89: 6856-6860), and its 

30 involvement in protein self-association have been documented for TEL, an ETS- 
domain protein, upon its fusion with platelet-derived growth factor p receptor (Carrol 
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et ai (1996) Proc. Natl. Acad. Sci. USA 93:14845-14850). Alignment of the N- 
terminal conserved domain in the ETS proteins was converted into a generalized 
sequence profile to scan the protein databases using the Smith-Waterman 
algorithm. This search revealed that the N-terminal domain in ASTH1I, ASTH1 J and 
5 other ETS-domain proteins belongs to the SAM-domain family (Schultz et ai (1997) 
Protein Science 6:249-253). SAM domains are found in diverse developmental 
proteins where they are thought to mediate protein-protein interactions. Thus, both 
ASTH1I and ASTH1 J are predicted to contain two conserved modules, the N- 
terminal protein interaction domain (SAM-domain) and the C-terminal DNA-binding 

10 domain (ETS-domain). The sequence segments between these two domains is 
predicted to have elongated, non-globular structure and may be hinges between the 
two functional domains in ASTH1I and ASTH1J. 

The ASTH1I altl (SEQ ID NO:7), alt2 (SEQ ID NO:9) and alt3 (SEQ ID 
NO:11) forms are 265, 255 and 164 amino acids in length, respectively, and differ at 

15 their 5' ends. The ASTH1 1 and ASTH1J proteins show similarity to each other in 
the ets domain and between ASTH1J exon c and ASTH1I exon e. They are more 
related to each other than to other proteins. Over the ets domain they are 66% 
similar (/e. have amino acids with similar properties in the same positions) and 46% 
identical to each other. All three forms of ASTH1 1 have the helix turn helix motif 

20 located near the carboxy terminal end of the protein. 

The alternate forms of the ASTH1 1 protein may differ in function in critical 
ways. The activity of ets transcription factors can be affected by the presence of 
independently folding protein structural motifs which interact with the ets protein 
binding domain (helix loop helix). The differing 5' ends of the ASTH1I proteins may 

25 help modulate activity of the proteins in a tissue-specific manner. 

Polymorphism analysis ofASTHII andASTHU 

Affected and unaffected individuals from the Toronto cohort were used to 
determine sequence variants, as were approximately 25 controls derived from 
30 populations not selected for asthma. Affected and unaffected individuals from the 
Tristan da Cunha population were also chosen; the set to be assayed was also 
selected to represent all the major haplotypes for the ASTH1 region in that 
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population. This ensured that all chromosome types for Tristan were included in the 
analysis. 

Polymorphism analysis was accomplished by three techniques: comparative 
(heterozygote detection) sequencing, radioactive SSCP and fluorescent SSCP. 
5 Polymorphisms found by SSCP were sequenced to determine the exact sequence 
change involved. 

PGR and sequencing primers were designed from genomic sequence 
flanking each exon of the coding region and 5' UTRs of ASTH1I and ASTH1 J. For 
fluorescent SSCP, the forward and reverse PCR primers were labeled with different 
10 dyes to allow visualization of both strands of the PCR product. In general, a variant 
seen in one strand of the product was also apparent in the other strand. For 
comparative sequencing, heterozygotes were also detected in sequences from both 
DNA strands. 

Polymorphisms associated with the ASTH1 1 locus are listed in Table 3. The 

15 sequence flanking each variant is shown. Polymorphisms were also deduced from 
comparison of sequences from multiple independent cDNA clones spanning the 
same region of the transcripts, and comparison with genomic DNA sequence. The 
polymorphisms in the long 3' UTR regions of these genes were found by this 
method. One polymorphism in each gene is associated with an amino acid change 

20 in the protein sequence. An alanine/valine difference in exon c of ASTH1J is a 
conservative amino acid change. A serine/cysteine variant in exon g of ASTH1I is 
not a conservative change, but would be found only in the alt3 form of the protein. 

The polymorphisms in the ASTH1I and J transcribed regions were genotyped 
in the whole Tristan da Cunha and Toronto populations, as well as in a larger 

25 sample of non-asthma selected controls, by high throughput methods such as OLA 
(oligonucleotide ligation assay; Tobe et ai (1996) Nucl. Acids Res. 24:3728) or 
Taqman (Holland et al. (1992) Clin. Chem. 38: 462), or by PCR and restriction 
enzyme digestion. The population-wide data were used in a statistical analysis for 
significant differences in the frequencies of ASTH1I or ASTH1J alleles between 

30 asthmatics and non-asthmatics. 
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TABLE 3: POLYMORPHISMS IN THE ASTH1I AND ASTH1J GENES. 



Polymorphism Location Sequence 

SEQ ASTH1I Transcribed region 

16. EXON B (+)170 ACAGAATGACETATGAAAAGT 

5 17. INTRON D (+)15 GTAACCAAGCECAAGCCACCC 

18. INTRON F (+)24 AAGGAGCCCAY.CTGAGTGCAG 

19. EXON G ( + )62 ser^cys CGTTCCATCTS.TGCTCTGTGC 

20. EXON H (+)77 AGCGCCTCGGXTGGCTGAGGG 

21. EXON A 3' UTR (+)1176 TGTATTCAAGXGCTATAACAC 

10 22. EXON I ( + )76 CACTGAGAAGCC£ZjlACAGGCCTGT 

23. EXON I ( + )86 CCCACAGGCCJSGTCCCTCCAA 

24. INTRON J (+)93 CGTCCATCTCY.AGCTCCAGGG 

ASTH1J Transcribed region 

25. EXON A 5' UTR ( + )38 GACTTGATAAY.GCCCGTGGTG 
15 26. EXON A 5' UTR (+)39 ACTTGATAACECCCGTGGTGC 

27. EXON A 5' UTR (+)99 CTCCCCTCCAMGAGCCACAGC 

28. INTRON A (+) 224/225 ATTTCCTGCATXZ^GTCTGGACTT 

29. INTRON A ( + )48 ATCCAAACACY.TGAGTGGAAA 

30. EXON A3 ( + )28 AGTTTCCTCARTGCGGGAGCT 
20 31. EXON C ( + )158 GCGAGCACCTY.TGCAGCATGA 

32. EXON C (+)190 a larval TTCACCCGGGXGGCAGGGACG 

33. INTRON D (-) 36/37 CTGGGGAAAAlGA±/T.GATCGCTGAC 

34. INTRON F (-)22 GTCAATTAAAY.GGCTCTCATT 

35. INTRON G (-)27 TAGATCATTCSTAACCTGCCT 
25 36. EXON I (3' UTR) ( + )22 AAAGAGAAATMCTGGAGCGTG 

37. EXON I (3' UTR) ( + )220 ATGAGGGGAAMAAGAAACTAC 

38. EXON I (3' UTR) ( + )475 TTTTGTATGTjiACATGATTTA 

39. EXON I (3' UTR) (+)871 AGCTTGGTTCXTTTTTGCTCC 

40. EXON I (3' UTR) (+)1084 TTGACACCAGRAACCCCCCAG 

30 5» to ASTH1J 

41. CAAT box -165 AAATGAGCCARTGTTTGTAAT 
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42. 


5PW1J_ 


P01+399 


ATCCATTTTGXATTCCTCATT 




43. 


5PW1J 


P01+1604 


CTGGAGCTCARACCAGACAGC 




44. 


5PW1J_ 


P02+1382 


GCCAGTGCAGSCATCATTACC 




45. 


5PW1J_ 


_P03+128 


AGTTCAAATCSTAATTTTTAT 


5 


46. 


5PW1J_ 


_P03+556 


TCATCAGAATXTAAATCTCCC 




47. 


. 5PW1J_ 


_P03+712 


GGAGATTCAGA / - TGAAGCAAGA 




48. 


5PW1J_ 


_P03+781 


TTTTTCCACAXCCAGCCTGGC 




49. 


5PW1J_ 


_P03+791 


CCCAGCCTGGXGAACCCTGGC 




50. 


5PW1J_ 


P03+820 


CTCTTCATCAXGGTCAAATAC 


10 


51. 


5PW1J_ 


_P03+1530 


CAACTTG CTGXCAAAGTGCTG 




52. 


5PW1J_ 


P03+1605 


TACTATGTGCXAGATACTAAG 




53. 


5PW1J_ 


P04+542/543 


ATGCCACTTTEEACAACTTGAG 




54. 


5PW1J_ 


P04+973 


CGCATGCCTGRAAAGAAGAGA 




55. 


5PW1J_ 


P04+1079 


GGATAAGCACMAGTGAGCCTG 


15 


56. 


5PW1J_ 


_P04+1153 


AAAGCCAGACRGCAACTTGTG 




57. 


5PW1J_ 


_P04+1430 


T CTCAAAAAGRGTGATAGGAG 




58. 


5PW1J_ 


P05+334 


TCTGAATCCTSTCTCCTCCTT 




59. 


5PW1J_ 


_P05+749 


TAGAACCAGGHTGTGGGACCA 




60. 


5PW1J_ 


_P05+915 


TTCTTGTGTCRGGCGCAAAAC 


20 


61. 


5PW1J_ 


_P06+529 


AACCAACATGEAGAAACCCCA 




62. 


5PW1J_ 


P06+1290 


AATAAACTATRGTTCACCTAG 




63. 


5PW1J_ 


_P06 + 1573 


ACATATTTGTRTCTCATATGA 




64. 


5PW1J_ 


_P06+1661 


CAAAGCAGTTXCTAATAATCC 




65. 


5PW1J_ 


_P07+335 


AGATCCTAACXGGGGCCTCCT 


25 


66. 


5PW1J_ 


P07+731 


CTCTTTCTCTXTGCTTCCTCC 




67. 


5PW1J_ 


P07+1024 


TTAGGAATCCiJCAAATATGTA 




68. 


5PW1J_ 


P07+1610 


GTCTGACTCCRCCTCCCTCAT 




69. 


5PW1J_ 


P08+398 


GAATCACATCRTGAGAAATGT 




70. 


5PW1J_ 


P08+439 


AATTCAATCCXTCACAGACTT 


30 


71. 


5PW1J_ 


P08+580 


GTGTAGCCAGRGTTGCTAATT 




72. 


5PW1J_ 


P08+762 


CCTAGAAATASCCAAGGGCAC 




73. 


5PW1J_ 


_P08+952 


AAATTCTCATRCCTCACCCTC 




74. 


5PW1J_ 


P08+1172 


TCCCACCCCTRTCACCTTCAT 




75. 


5PW1J_ 


P08+1393 


CCTCATTCTCRGAAGCCAACA 


35 


76. 


5PW1J_ 


P08+1433 


GAAGAGCCGTXCAGTCCCTTT 




77. 


5PW1J_ 


P08+1670 


TCCATAGGCTXTTTATTTGGC 




78. 


5PW1J_ 


P08+1730 


TCGTTTAGTAXACAGGCTTTG 




79. 


5PW1J_ 


P09+59 


GCCTCAGTTGXCCCAGCTATA 




80. 


5PW1J_ 


P09+145 


AGCAAAATGCHCTATGCACTG 


40 


81. 


5PW1J_ 


P09+892 


GTGTCCTGAC (TTGCACTCCAC) /- 
ACACTGCCTG 




82. 


5PW1J_ 


P10+1070 


ATCAGATAACRCCTACACTTA 




83. 


5PW1J_ 


P10+1511 


TCTCTCTTCTSCCTGCCCTGT 




84 . 


5PW1J_ 


P09+1132 


TGGACACAGGRAGGGGAATAT 
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o c 


dPWU r0y + 16oo 


TGTCACTTGCSCATACAAGGC 




Q C. 

OD . 


c dum t Dno i i QAn 


AT CAT CAGA I XAGCCCAGAAT 




q n 


brWU WIKI-IUdO 


TCAACAGAGASAGTTAATGGT 




oo . 


OLJT .T TaTI D1 - T Q*31 
DirW-LU WlKl-lOJl 




5 




jrnxu ni rv x. *j .^j 


ivl rtUV 1 1 1 1 A 1VJJ.V31 11111 




90 . 


5PW1J W1R1-3160 


GATTCCTTAAYGCTTGATACT 




91 . 


5PW1J W1R1-3787 

a^ A» » » ^a» *^ • • aW * « ' # \& m 


CCTCCTCCAGYACCAAAGTGG 




92 . 


W1J CD+24 


ATGGCCACAGRTCAAATCCTG 




93 . 


W1J CA+564 

~ » A ^^A A 1 •fc-' A 


ACTGAGTGTTYATGCCAATTT 


10 




5 1 to ASTH1T 






94 . 


WI_CL+94 


GACAAGCCCTRTCTGACACAC 




95 . 


WI CN+134 

W m Aa ^#*-A v AW A^ A> 


TGAAAAGCCTYCTTGCTGCCT 




96 . 


WI CQ-28 


TCCTGGAGTTYCTTTGCTCCC 

A A S^J WI J>W A A j*^ AAA A V*- >— 




97 . 


WI CO+3 9 


GATTCCAAATWAACTAAAGAT 

A A A V^*A**** A ^^**A* ^**» A A A4 A* Iwli A 


15 


98 . 


P14-16+191662 

A* aW A >^ aW aW %^ V/ AA* 


GACCTCAAGTCRTCCACCCGCC 

A A A AA Aw A X^ A^S A X^ X^A A X^ 




99 . 


P14-16+192592 


AACAAATACTMCCCCGCAACCC 




100 . 


P14 -16+192762 


ATTTTTTTTTT / - AAGGAAAATA 




101 . 


P14-16+195066 


AAATTTCCCCMAAACAAGCAG 




102 . 

aw aw • 


P14-16+196590 


GAGAAAGGGTRTGTGTGTGTG 

*W* AA AA *\J XtaJ A A\ A A A W A W A 


20 


103 . 

•a* * 


P14-16+196617 


GTGTGTGTGTGT- /GTGTATGTGCGCGTG 




104 . 


P14-16+196902 


ATCGGGAACCYCATACCCCAA 




105 . 


P14-16+198040 


TTTGTTTCGCMATGAGGTACG 




106 . 

• 


P14-16+198240 


TGAGGGTGTTSTGGGCTGGAC 




107 . 


P14-16+198840 


TCTTCATTGGYATCTGAATGT 

X » X X >m.A A X X V«J A AvX X >— . X NJ-A*** ^ \J ^ 


25 


108 . 

*A> V • 


P14-16+200120 


GCGAGCACCTYTGCAGCATGA 




109 . 

AW A^ % 


P14-16+200617 


a^ACCCCCCCCMCACACACACA 




110 . 

>AW AW ■ 


J5-16+4454 


TCAGTGCTCTSTAATCAGTCA 

X W«1VJ X \J Vr X X X ***X X V«>A7XN^ X ^Xi 




HI. . 


J5-16+4825 


TCTTTGTGa^A- / (GA) aAATTAGTCTG * 

A W AAA >w* A ax x / \ X A # A AA A A A A A\J A . A \-J 




112 . 


J5-16+5426 


GCTGCCCTGi\SAGCTGGGCCA 


30 


113 . 


J5-16+5623 


CCTTCTGATCYTTGTTTGCTG 

X^ A A X^ A X^A A A VA A * X^ A A ^ *^ .. J A 




114 . 


J5-16+7386 


GGi\ACACTGAiCTCTTGATTAG 

X^ ^rf* AA A ^^A A A> X^A *t9^k ^» • • X^A A A A A^^ 




115. 


J5-16+7904 


TAGGCTTCTCYTGATi\ATTGA 

A> A ^> ^» *y « A A> A AA A Ak A> A 




116. 


J5-16+8055 


TCTTAa^ATa^VMTTGGCTTGTA 




117. 


J5-16+10595 


TAGATCATTa^RTa^CCTGCCT 


35 


118 . 


J5-16+11140 


ATGAGGGGAa^VMa^AGaZU^ACTAC 




119. 


J5-16+12004 


TTGACACCAGRa^ACCCCCCAG 




120. 


J5-16+12219 


TGTTTTa^ATRTTAGGGACAA 




121. 


J5-16+12303 


GTa^xAGCATAGYa^xATGTAGCAG 




1 99 


.T^-T Cxi IKClA 
UD-lD + ljDUft 


VjVjV-. 1L1 1 ILl J^LAHLL 111 


40 


123. 


J5-16+14120 


GACCCAGGTTETGAGTTTTCC 




124 . 


ASTH1I, exon B +16 9 


GACAGa^TGAXATATGAaAa^AG 




125. 


ASTH1I, exon 1+69 


TGTGTGACACYGAGAAGCCCA 
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X ^* D • 


ASTH1J, 


exon C +56 






127 


5 1 ASTH1J, WI_Cg -9 


PPTGGGAGPARGTATTGPATT 






ASTH1J 


Intron A 






128 


WIJ_Ia01 


+39 


AGATTTGAGGYCTCAGGTCCC 


5 


129 


WIJ_Ia01 


+140 


TGTCAATGTCRCATGATAAGP 




1^0 


WIJ_Ia01 


+678 


TTGCPCCAGTKTTPTPPGGGP 




131 


WIJ_Ia01 


+855 


TATGAGCAGCRTAGGGAGTGG 




132 


WIJ_Ia01 


+929 


AGTTGACTGA (AAAA) /-TAAATAAGAC 




133 


WIJ_Ia 


03 


+362 


ATTCAAATAGS PTPTAGAAAP 


10 


134 


WIJ_Ia 


03 


+918 


CCCAGAATTTMATATCCATTC 

W UflUilXi XXX i Ux X xl X V^»xx X X \»» 




i j j . 


WIJ_Ia 


03 


+ 943 


TGAPPPAAPARAAAPTPAPTG 

X VJ.fi V— V^.rxr'V ^x^I\^"xXiri X v^Jr^V— X \J 




136 


WIJ_Ia 


03 


+1569 


CC AG AAT AT AWPATPAGPPPT 




137 

X -> / . 


WIJ_Ia 


03 


+1580 


PATPAGPPPTWPTGAGGAGAT 

V-xT. X \— .xTkW \— V— X »1 V»» X VJ -T"x VJVJ /x\JA X 




X .3 o • 


WlJ_Ia 


02 


+435 


PPAGAAPAGAYTTTATTPTGT 

V— . v^fiwATl'wilUil X XXX x**i X X V-» X \J X 


15 




WIJ_Ia 


02 


+583 


TTPAGPPATPYTTPPAGTTGT 

x x ^rivJ V— ^.n x v— x 1 i v> vrivj x x vj x 




I/O 
X *± vJ . 


WIJ_Ia 


02 


+643 


TPAPTAAPTPWAAAAPGAPAT 




1/1 

X *± X . 


WIJ_Ia 


02 


+648 


A APTPA AAA AVGAPATPPTPP 




14 2 


WIJ_Ia 


02 


+1048 


GAAPTGPAPARGTTGPAPAPT 




x*± o • 


WIJ_Ia 


02 


+1061 


TTGTTPPATG^APTAPPTPPT 


20 

4mm W 


1 44 


WIJ_Ia 


02 


+1142 


A PAG P AGG P A YTP AAP AAATT 




1 / c 

X *± . 


WIJ_Ia 


04 


+410 


X X r~i X X X X X uvJO XX X \j X X X X t\tx 




146 


WIJ_Ia 


04 


+1056 


TAGGPTGTTPYPTGPPATPAP 

X jMVJVJ^ lull L.1V- X UUUri. X V— irVV^ 




147 


WIJ_Ia 


05 


+ 1484 


GTGPTPTGGGMPAPAPAGPTP 

\J X uV, 1^1 VJ Vj VJl'i ^t\\^r\\^mr\\J \~ X 




14ft 
X *± o • 


WT.T Ta 

nlU X CI 


05 


+1103 


AG APPPGATAR GAGPTPPTTP 


25 


1 4 Q 
x *± y . 


WIJ_Ia 


05 


+ 1823 


PATPTTGPGPRGTPATGTAAG 

V— 4TA. X v_ X 1 vJvOUJ\.\J X v^ril \J XX"LM\J 






WIJ_Ia 


05 


+1852 


PAGPAPAGPTRTTPPPTPAAA 




1 R1 


WIJ_Ia 


05 


+1906 


TTTGGAAAPAYGGTGAAGTAT 




1 ^2 


WIJ_Ia 


05 


+ 1913 

1 


APAPGGTGAARTATTGTPTPP 




1 

X -> • 


WIJ_Ia 


06 


+ 794 


AAAAGTGGATMPTPTGPAAAP 


30 


1 54 


WIJ_Ia 


06 


+814 


PTTPAAATGPRGPTATTAAAG 




1 

X J • 


WIJ_Ia 


06 


+1197 


PPTGGGAGPAYGGTAAATPAG 




156 . 


WIJ_Ia 


06 


+1231 


TGAAAATGTCECTTTCTCACCT 




157. 


WIJ_Ia 


06 


+ 1256 


CCTGATATTTRCCAACAAGAA 




158. 


WIJ_Ia 


06 


+ 1535 


AAAGGGTTAGYTTGTCCCCTT 


35 


159. 


WI Caa 


+163 


TGAAAATAAAASACAATTTTTT 



The sequences are listed with the variant residues represented by the appropriate single letter 
designation, i.e. A or G is shown by M R". The variant residues are underlined. Where the 
polymorphism is a deletion, the underlined residues are underlined, and the alternative form shown 
as a 

40 a Where intron 'a* is the intron 3 1 to exon 'a', etc. 

Position numbers correspond to the position within the intron or exon, with nucleotide +1 being the 
5-most base of the exon or the intron. Alternatively, negative numbers denote the number of bases 
from the 3' end of an intron. 

c Position in cDNA = position # for the exon a form of ASTH1J or the exon i form of ASTH1I. 
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d Exonic sequences are uppercase, intronic sequences lower case. 
UTR = untranslated region. N/A = not applicable. 

Cross-species sequence conservation 
5 Cross-species sequence conservation can reveal the presence of 

functionally important areas of sequence within a larger region. Approximately 90 
kb of sequence lie between ASTH1I and ASTH1J, which are transcribed in opposite 
directions (Figure 1). The transcriptional orientation of these genes may allow 
coordinate regulation of their expression. The expression patterns of these genes 

10 are similar but not identical. Sequences found 5' to genes are critical for 

expression. To search for regulatory or other important regions, the genomic 
sequence between ASTH1I and ASTH1J, was examined and plasmid clones 
derived from genomic sequencing experiments chosen for cross-species 
hybridization experiments. The criterion for probe choice was a lack of repeat 

15 elements such as Alu or LINEs. Inserts from these clones were used as probes on 
Southern blots of EcoRI-digested human, mouse and pig or cow genomic DNA. 
Probes that produced discrete bands in more than one species were considered 
conserved. 

Conserved probes clustered in four locations. One region was located 5' to 
20 ASTH1 1 and spanned exon j of this gene. A second conserved region was located 
5' to ASTH1IJ, spanning approximately 10 kb and beginning 6 kb 5' to ASTH1J 
exon a (and is within SEQ ID NO:1). Two other clusters of conserved probes were 
noted in the region between ASTH1I and J. They are approximately 10 and 6 kb in 
length. 

25 Promoters, enhancers and other important control regions are generally 

found near the 5' ends of genes or within introns. Methods of identifying and 
characterizing such regions include: luciferase assays, chloramphenicol acetyl 
transferase (CAT) assays, gel shift assays, DNAsel protection assays (footprinting), 
methylation interference assays, DNAsel hypersensitivity assays to detect 

30 functionally relevant chromatin-ree regions, other types of chemical protection 
assays, transgenic mice with putative promoter regions linked to a reporter gene 
such as p-galactosidase, etc. Such studies define the promoters and other critical 
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control regions of ASTH1I and ASTH1J and establish the functional significance of 
the evolutionary conserved sequences between these genes. 

Discussion 

5 The ASTH1 locus is associated with asthma and bronchial hyperreactivity. 

ASTH1I and ASTH1J are transcription factors expressed in trachea, lung and 
several other tissues. The main site of their effect upon asthma may therefore be in 
trachea and lung tissues. Since ets family genes are transcription factors, a 
function for ASTH1I and ASTH1 J is activation of transcription of particular sets of 

10 genes within cells of the trachea and lung. Cytokines are extracellular signalling 
proteins important in inflammation, a common feature of asthma. Several ets family 
transcription factors activate expression of cytokines or cytokine receptors in 
response to their own activation by upstream signals. ELF, for example, activates 
IL-2, IL-3, IL-2 receptor a and GM-CSF, factors involved in signaling between cell 

15 types important in asthma. NET activates transcription of the IL-1 receptor 

antagonist gene. ETS1 activates the T cell receptor a gene, which has been linked 
to atopic asthma in some families (Moffatt et ai (1994) supra.) 

Activation of genes involved in inflammation by other members of the ets 
family suggest that the effect of these ASTH1 genes on development of asthma is 

20 exerted through influencing cytokine or receptor expression in trachea and/or lung. 
Cytokines are produced by structural cells within the airway, including epithelial 
cells, endothelial cells and fibroblasts, bringing about recruitment of inflammatory 
cells into the airway. 

A model for the role of ASTH1I and ASTH1J in asthma that is consistent with 

25 the phenotype linked to ASTH1 , the expression pattern of these genes, the nature 
of the ASTH1 l/J genes, and the known function of similar genes is that aberrant 
function of ASTH1I and/or ASTH1J in trachea or lung leads to altered expression of 
factors involved in the inflammatory process, leading to chronic inflammation and 
asthma. 

30 
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Functional analysis of a ASTH1 J promoter sequence variant and location of the 
ASTH1 J promoter 

Primer extension analyses performed using total RNA isolated from both 
bronchial and prostate epithelial cells have revealed one major and five minor 
5 transcription start sites for ASTH1J. The major site accounts for more than 90% of 
ASTH1J gene transcriptional initiation. None of these sites are found when the 
primer extension analysis is performed using mRNA isolated from human lung 
fibroblasts that do not express ASTH1J. 

Identification of the ASTH1 J transcriptional start site has allowed the 
10 localization of a putative TATA box (TTTAAAA) between positions -24 and -30 (24 
to 30 bp 5' to the transcription start site). Although the sequence is not that of a 
typical TATA box, it conforms to the consensus sequence (TATAAAA) for TATA box 
protein binding as compared with 389 TATA elements (Transfac database: 
http://transfac.gbf-braunschweig.de/, ID: V$TATA_01). 

15 

Analysis of the CAAT box "G" polymorphism bv ael shift assay 

Binding of nuclear proteins to a polymorphism in the GCCAAT motif 
(GCCAAT or GCCAGT) found at position -140 (140 bp 5' to the transcription start of 
ASTH1 J as defined by primer extension experiments, previously referred to as 

20 "-165 bp"), has been assessed using electrophoretic mobility shift assays. These 
experiments clearly showed a remarkable difference when binding of nuclear 
proteins to radioactively-labelled double stranded oligonucleotides containing the 
normal "A" vs the mutant "G" nucleotide was examined. A specific set of nuclear 
proteins was able to bind to the normal oligonucleotide, but did not bind to the "G" 

25 oligonucleotide. The specificity of the DNA binding complexes was further 

addressed by competition with either normal or mutant unlabeled oligonucleotides. 
Addition of increasing amounts of normal unlabeled oligonucleotide effectively 
competed binding of nuclear proteins to the labeled normal oligonucleotide, while 
the addition of increasing amounts of unlabelled "G" oligonucleotide did not. 

30 The GCCAAT cis-element is found in many promoters at various locations 

relative to genes, as well as in distal enhancer elements. There is no known 
correlation between location of these elements and activity. Both positive and 
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negative regulatory trans-acting factors are known to bind this class of cis element. 
These factors can be grouped into the NF-1 and C/EBP families. 

The nuclear factor-1 (NF-1) family of transcription factors comprises a large 
group of eukaryotic DNA binding proteins. Diversity within this gene family is 
5 contributed by multiple genes (including: NF-1 A, NF-1B, NF-1C and NF-1X), 
differential splicing and heterodimerization. 

Transcription factor C/EBP (CCAAT-enhancer binding protein) is a heat 
stable, sequence-specific DNA binding protein first purified from rat liver nuclei. 
C/EBP binds DNA through a bipartite structural motif and appears to function 

10 exclusively in terminally differentiated, growth arrested cells. C/EBPa was originally 
described as NF-IL-6; it is induced by IL-6 in liver, where it is the major C/EBP 
binding component. Three more recently described members of this gene family, 
designated CRP 1, C/EBP p and C/EBP 5, exhibit similar DNA binding specificities 
and affinities to C/EBP a. Furthermore, C/EBP p and C/EBP 5 readily form 

15 heterodimers with each other as well as with C/EBP a. 

Members of the C/EBP family of transcription factors, but not members of the 
NF-1 family, bind to the ASTH1J promoter region, as determined by the use of 
commercially available antibodies (Santa Cruz Biotechnologies, Santa Cruz, CA) 
that recognize all NF-1 and C/EBP family members known to date, in 

20 electrophoretic mobility shift assays. 

Fabricating a DNA array of polymorphic sequences 

DAW array: is made by spotting DNA fragments onto glass microscope slides 

which are pretreated with poly-L-lysine. Spotting onto the array is accomplished by 
25 a robotic arrayer. The DNA is cross-linked to the glass by ultraviolet irradiation, and 

the free poly-L-lysine groups are blocked by treatment with 0.05% succinic 

anhydride, 50% 1-methyl-2-pyrrolidinone and 50% borate buffer. 

The spots on the array are oligonucleotides synthesized on an ABI 

automated synthesizer. Each spot is one of the alternative polymorphic sequences 
30 indicated in Tables 3 to 8. For each pair of polymorphisms, both forms are 

included. Subsets include (1) the ASTH1J polymorphisms of Table 3, (2) the 
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ASTH1I polymorphisms of Table 3; and (3) the polymorphisms of Table 4. Some 
internal standards and negative control spots including non-polymorphic coding 
region sequences and bacterial controls are included. 

Genomic DNA from patient samples is isolated, amplified and subsequently 
5 labeled with fluorescent nucleotides as follows: isolated DNA is added to a standard 
PCR reaction containing primers (100 pmoles each), 250uM nucleotides, and 
5 Units of Taq polymerase (Perkin Elmer). In addition, fluorescent nucleotides 
(Cy3-dUTP (green fluorescence) or Cy5-dUTP (red fluorescence), sold by 
Amersham) are added to a final concentration of 60 uM. The reaction is carried out 

10 in a Perkin Elmer thermocycler (PE9600) for 30 cycles using the following cycle 
profile: 92°C for 30 seconds, 58°C for 30 seconds, and 72°C for 2 minutes. 
Unincorporated fluorescent nucleotides are removed by size exclusion 
chromatography (Microcon-30 concentration devices, sold by Amicon). 

Buffer replacement, removal of small nucleotides and primers and sample 

15 concentration is accomplished by ultrafiltration over an Amicon microconcentrator- 
30 (mwco = 30,000 Da) with three changes of 0.45 ml TE. The sample is reduced 
to 5 pi and supplemented with 1 .4 pi 20X SSC and 5 pg yeast tRNA. Particles are 
removed from this mixture by filtration through a pre-wetted 0.45p microspin filter 
(Ultrafree-MC, Millipore, Bedford, Ma.). SDS is added to a 0.28% final 

20 concentration. The fluorescently-labeled cDNA mixture is then heated to 98°C for 2 
min., quickly cooled and applied to the DNA array on a microscope slide. 
Hybridization proceeds under a coverslip, and the slide assembly is kept in a 
humidified chamber at 65°C for 15 hours. 

The slide is washed briefly in 1X SSC and 0.03% SDS, followed by a wash in 

25 0.06% SSC. The slide is kept in a humidified chamber until fluorescence scanning 
was done. 

Fluorescence scanning and data acquisition. Fluorescence scanning is set 
for 20 microns/pixel and two readings are taken per pixel. Data for channel 1 is set 
to collect fluorescence from Cy3 with excitation at 520 nm and emission at 550- 
30 600 nm. Channel 2 collects signals excited at 647 nm and emitted at 660-705 nm, 
appropriate for Cy5. No neutral density filters are applied to the signal from either 
channel, and the photomultiplier tube gain is set to 5. Fine adjustments are then 
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made to the photomultiplier gain so that signals collected from the two spots ar 
equivalent. 



Construction of an asthU Transgenic Mouse 
5 Isolation of mouse asth1-J genomic fragment: 

Phage MW1-J was isolated by screening a mouse 129Sv genomic phage 
library (Stratagene) with the 443bp BamHI-Smal fragment from the 5' region of the 
human asth1-J cDNA clone PA1001A as probe. The 23kb insert in MW1-J was 
sequenced. 

10 

Assembly of asthl-Jexb targeting construct: 

A 2.65kb Sad fragment (bp71!5-bp9765) from MW1-J was isolated, cloned 
into the Sacl site of pUC19, isolated from the resultant plasmid as an EcoRI-Xbal 
fragment, inserted into the EcoRI-Xbal sites of pBluescriptll KS+ (Stratagene), and 

15 the 2.5kb Xhol-Mlul fragment isolated. A 5.4kb Hindlll fragment (bp1 151 5-bp1 6909) 
was isolated from MW1 -J, inserted into the Hindlll site of pBluescriptll KS+, 
reisolated as a Xhol-Notl fragment, inserted into the Xhol-Notl sites of pPNT, and 
the 9.5kb Xhol-Mlul fragment isolated. The two Xhol-Mlul fragments were ligated 
together to produce the final targeting construct plasmid, asthlexb. Asthlexb was 

20 linearized by digestion with Notl and purified by CsCI banding. 

Identification of targeted ES clones: 

Approximately 10 million RW4 ES cells (Genome Systems) were 
electroporated with 20 pg of linearized asthlexb and grown on mitomycin C 

25 inactivated MEFs (Mouse Embryo Fibroblasts) in ES cell medium (DMEM + 15% 
fetal bovine serum+1000U/ml LIF (Life Technologies)) and 400 pg/ml G418. After 
24-48hrs, the cells were refed with ES cell medium. After 7-10 days in selection 
culture approximately 200 colonies were picked, trypsinized, grown in 96 well 
microtiter plates, and expanded in duplicate 24 well microtiter plates. Cells from 

30 one set of plates were trypsinized, resuspended in freezing medium (Joyner, A., 
ed., Gene Targeting, A Practical Approach. 1993. Oxford University Press), and 
stored at -85C. Genomic DNA was isolated from the other set of plates by standard 
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methods (Joyner, supra.) Approximately 10 |jg of genomic DNA per clone were 
digested with Ndel and screened by southern blotting using a 100 bp fragment 
(bp6164-bp6260) as probe. A banding pattern consistent with targeted 
replacement by homologous recombination at the asth1-J locus was detected in 10 
5 of 1 1 3 clones screened. 

Production ofasth1-J knockout mice: 

Two of the targeted clones, cl#117 and cl#58, were expanded and injected 
into C57BL/6 blastocysts according to standard methods (Joyner, supra). High 

10 percentage male chimeric founder mice (as ascertained by extent of agouti coat 
color contribution) were bred to A/J and C57BL/6 female mice. Germline 
transmission was ascertained by chinchilla or albino coat color offspring from A/J 
outcrosses and by agouti coat color offsprint from C57BL/6 outcrosses. The Ndel 
southern blot assay employed for ES cell screening was used to identify germline 

15 offspring carrying the targeted allele of Asth1-J. Germline offspring from both A/J 
and C57BL76 outcrosses were identified and bred with A/J or C57BL/6 mates 
respectively. 

Mice heterozygous for the Asth1-J targeted allele are interbred to obtain 
mice homozygous for the asth1-J targeted allele. Homozygotes are identified by 

20 Ndel Southern blot screening described above. The germline offspring of the 

chimeric founders are 50% A/J or C57BL6 and 50% 129SvJ in genetic background. 
Subsequent generations of backcrossing with wild type A/J or C57BL/6 mates will 
result in halving of the 129SvJ contribution to the background. The percentage A/J 
or C57BL76 background is calculated for each homozygous mouse from its 

25 breeding history. 

Molecular and cellular analysis of homozygous mice: 

Various tissues of homozygotes, heterozygotes and wild type littermates at 
various stages of development from embryonic stages to mature adults are isolated 
30 and processed to obtain RNA and protein. Northern and western expression 
analyses as well as in situ hybridizations and immunohistochemical analyses are 
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performed using cDNA probes and polyclonal and/or monoclonal antibodies specific 
for asth1-J protein. 



Phenotypic analysis of homozygous mice: 
5 A/J, C57BL/6, wild type, heterozygous and homozygous mice in both A/J and 

C57BL/6 backgrounds at varying stages of development are assessed for gross 
pathology and overt behavioral phenotypic differences such as weight, breeding 
performance, alertness and activity level, etc. 

Metacholine challenge tests are performed according to published protocols 
10 (De Sanctis et ai (1995). Quantitative Locus Analysis of Airway 

Hyperresponsiveness in A/J and C57BL/6J mice. Nat. Genet 11:150-154.). 

Targeting at asth1-J exon C: 
Assembly of exon C targeting construct: 

15 A 3.2kb Hindlll-Xbal fragment (bp11515-bp14752) from MW1-J was isolated, 

cloned into the Hindlll-Xbal site of pUC19, isolated from the resultant plasmid as a 
Kpnl-Xbal fragment, inserted into the Kpnl-Xbal sites of pBluescriptll KS+ 
(Stratagene), and the 4.5kb Rsrll-Mlul fragment isolated. A 3.4kb Hindlll fragment 
(bp17217-bp20622) was isolated from MW1-J, inserted into the Hindlll site of 

20 pBluescriptll KS+, reisolated as a Xhol-Notl fragment, inserted into the Xhol-Notl 
sites of pPNT, and the 9.5kb Rsrll-Mlul fragment isolated. The two Rsrll-Mlul 
fragments were ligated together to produce the final targeting construct plasmid, 
Asthlexc. Asthlexc was linearized by digestion with Notl and purified by CsCI 
banding. 

25 

Identification of targeted ES clones: 

Approximately 10 million RW4 ES cells (Genome Systems) were 
electroporated with 20pg of linearized asthlexc and grown on mitomycin C 
inactivated MEFs (Mouse Embryo Fibroblasts) in ES cell medium (DMEM + 15% 
30 fetal bovine serum+1000U/ml LIF (Life Technologies)) and 400 pg/ml G418. After 
24-48hrs, the cells were refed with ES cell medium. After 7-10 days in selection 
culture approximately 200 colonies were picked, trypsinized, grown in 96 well 
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microtiter plates, and expanded in duplicate 24 well microtiter plates. Cells from one 
set of plates were trypsinized, resuspended in freezing medium (Joyner, supra), 
and stored at -85C. Genomic DNA was isolated from the other set of plates by 
standard methods (Joyner, supra). Approximately 10 pg of genomic DNA per clone 
5 were digested with Ncol and screened by southern blotting using a 518bp fragment 
(bp8043-bp8560) as probe. A banding pattern consistent with targeted replacement 
by homologous recombination at the Asth1-J locus was detected in 3 of 46 clones 
screened. 

Targeted clones are injected into blastocysts and high percentage chimeras 
10 bred to A/J and C57BL/6 mates analogously to that done for asthl-Jexb knockout 
mice. Heterozygote, homozygote and wild type littermates are obtained and 
analyzed analogously to that done for asthl-Jexb knockout mice. 

The data presented above demonstrate that ASTH1I and ASTH1J are novel 

15 human genes linked to a history of clinical asthma and bronchial hyperreactivity in 
two asthma cohorts, the population of Tristan da Cunha and a set of Canadian 
asthma families. A TDT curve in the ASTH1 region indicates that ASTH1I and 
ASTH1J are located in the region most highly associated with disease. The genes 
have been characterized and their genetic structure determined. Full length cDNA 

20 sequence for three isoforms of ASTH1 1 and three isoforms of ASTH1 J are reported. 
The genes are novel members of the ets family of transcription factors, which have 
been implicated in the activation of a variety of genes including the TCRot gene and 
cytokine genes known to be important in the aetiology of asthma. Polymorphisms 
in the ASTH1I and ASTH1J genes are described. These polymorphisms are useful 

25 in the presymptomatic diagnosis of asthma susceptibility, and in the confirmation of 
diagnosis of asthma and of asthma subtypes. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application 
were specifically and individually indicated to be incorporated by reference. The 

30 citation of any publication is for its disclosure prior to the filing date and should not 
be construed as an admission that the present invention is not entitled to antedate 
such publication by virtue of prior invention. 
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Although the foregoing invention has been described in some detail by way 
of illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
that certain changes and modifications may be made thereto without departing from 
5 the spirit or scope of the appended claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 



(i) APPLICANT: AxyS Pharmaceuticals, Inc. 
(ii) TITLE OF THE INVENTION: Asthma Related Genes 



(iii) NUMBER OF SEQUENCES: 339 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Bozicevic & Reed, LLP 

(B) STREET: 285 Hamilton Ave, Suite 200 

(C) CITY: Palo Alto 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 94301 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 21-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME : Sherwood, Pamela J 

(B) REGISTRATION NUMBER: 36,677 

(C) REFERENCE /DOCKET NUMBER: SEQ-4P 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-327-3231 

(B) TELEFAX: 650-327-3231 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72928 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



GCACTTTTTG GGGAAGGTGG AAGAATAAAA GTAAGGGAGG TGTGCTGAGA CTTCAATTTT 



60 
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AATATCTTAT TTCTTAGGTT GAGTGTTACA CAGGCATTTG TAATCATATA TACTTTTGTA 120 

CACTTGAAAT ATATATATTT GTGTGTGTGT GTGTGTGTGT GTCAGAGTCT CACTCTGTCT 180 

CCCAGGCTGG AGTGCAGTGG TGTGATCTTG GCTCATTGCA ACCTCCACCT CCCAGGTTCA 240 

AGAGCTTTTT GTGCCTCCAT CTCCTGAGTA GCTGAGACTA CAGGCAAGCA CCACCACACC 300 

GGCTAATTTT TGTATTTTTA GTAGAGATGG GGTTTCACCA TGTTGCCCAG GCTGGTCTCA 360 

AATTCCTGGC CTCAAGTGAT CCAGTCACCT TGGCCTCCCA AAGTGCTGGA ATTACAGGCG 420 

TGAGCCACCA TGCCCGGTCT GAAATATTTC AAAATGTAAA AAAGCTAAAC CCAAATCCAG 480 

ATGTCTACTT TCAAGGTGCT CACAGGTCAG ATCTAGGATT ATTGCTACTA ACTGATATTT 540 

ATTATCCCAG CACCAGCATG TTTGGCTGTG TGTCATGGGT AAGTTACTCA CCTTCTCTGC 600 

GACAGTGTCA TCATTGTAAA ATAGGGATAA AAGAGTTTAG ACCTTGCAGA GTCCTTCAGA 660 

TTAAAGGAGA TAATCAGTAC GTGGCACTGA GTACCTGCAA TATATTAAGT GGTGTGTGCT 720 

CAGAGATATG ATCACATACA GTATCTTGGA TCTGCCCAGC AACTCTATGA AGATGAGGAA 780 

ACAGACTCAG GCAGGTCAGA GCCAGAACAT AATGTTTCTG GAATTTGAAC GTAAACGTTC 840 

CCCTTTCTCT TATCCAGGCT GAGTGCTAAA GGAATTGTAA AAATGGAATT TGCCTGTTGC 900 

CTGCATCTCC CTCTCTTTTT CTTCCTCTGT GTCCTCTGAA TATCTAGCAC CAGTGGGACT 960 

TTACAGTGTT GGCCTCAATG CTGTAGGGTG CTGTGTGCAC ACTTGTCTTC AGCTCCCTGA 1020 

GTTAGCAGAG CATTGCCCCA ACTCTGCCCT CTGGCCAGCT CATGTGCCTT ACAACTTTCT 1080 

GTTGCCAGAA GAGAGCCCTG CTCATTCTCT AGACTCAACC AACAAAAGCT GCCTACCATT 1140 

TTCAGAATGC CAGTGGGCAG TGAGAAGTGC AGAGCTTGTG TCCTGAGCTT GGCAGCCATC 1200 

TTGCTTGGTG TTAACAAAGA GTAATTAAGT GATCTCATAA AACTCAGTGG TGGAGGTTGT 1260 

GGTTCAGAGC AAGCTGGGTC AATGCCAAGG CTACTTTGGC TTCATCTGGT CCATAGCCCC 1320 

ACATTTCTCT TCTGATGGTT CAGTTCCGGG AATGAGAACC AGTCTGAGTG TAAGAAGACT 1380 

TGGGTTTGAA TCTGTCTCCT CCAATCACTA GCTGACCTTA GAAAAGTGAC TTAAGCTCCC 1440 

GAGCTGCTAT TTCCTCATCT TAAATGGTGA TAGTAATCTT TCCTTACCTT AAGGTTGTTG 1500 

AGCAGCTTAA ATAATATAAT GAGTTGAAAG CTTTTTGTAT GATCTGTTAT TAGGAGTCCA 1560 

GATAGTGTTT TATAAACAAG AGGATAAAAA AAAAAAAAAA AAAAAAAACA GGATTCTGAA 1620 

GGCTGGACTC ATTGCATTCC TTGCAAACTA CCCACTGAGC CCCAACTCTT CCGTCAGCTC 1680 

AAAGTCACTT CTCAGAGCAA ACCAGATTGT CCTGAACCCA GCACTTGCCA ACATCTCCTC 1740 

CTCTTCCCTG ATGAAAACTC TGGGCTGGAG TTGTGGTGGG TGAGGGGAAG GCAGGATAAA 1800 

TCAAAAATTG ATGTTTTAAG AAAACTATGG TATTCTTGGA TGCAAAGGCA TGAGAATGAT 1860 

ACCTTAGACT TTGGGGCTTG GGGAAAAGGG TGGGGGGTGG CGAGGGATAA AAGACTACAC 1920 

ATTGGGTTTA GTGGACACTG CTCGGGTTAT GGGTGCACCA AAATCTCAGA AATCACCACT 1980 

AAAGAACTGA TTCAGGTAAC CAAACACCAC CTGTTCCCCA AAAACCTATT GAAATAAAAA 2040 

CAGAAAATTA AAAAAAAGAA AACCTATGGT ATTCTTGGAA GAAGCACAGT GGTGAAGTGG 2100 

AGTAGACACA GATGTGGAAG TGATGTGAAC TTTGGTAAGT TGCTGAGCCT CTGAGGATGA 2160 

TTTCCCTCAT CTGTCAATCA GGGAACAAAA TCCCTTACTT GTACAATGAG TATTATAAAG 2220 

ATCAATTCAG ATGACGCATG TAAAGATGCA ATGTGGGACT GGTAGGTAGT AAGCATCCCA 2280 

TAAATGGCAG CTATTAATAA GTAATAATCA CCGAGTGGTG GGCTGCCTTT CATGAAAACA 2340 

TTCCCAGCAA GCTGCTCTTC TGTCGGCTCA AAGTCACTTC TCAGAGTAAA TGAGATTGGC 2400 

CAGTTCTTTC TTTCCAAGGC TTTTCTGGAT ATTCATTTGT CCCAGATTTC TCCTGTATAC 2460 

AAAGCTCAGG AGTGAGGACC CCCACAGTGG GGCTTGCACA AGGATAGCCT TGGGGGGCTT 2520 

TTTCTAAGAG CTATGACTTT GAATGCTCTC TTCATCGATG CTGACAGATG AGGGCTGATG 2580 

GAAGTGGTCA TGTTTTAAAA TGTCTGATGT CCAGAAACAC AGAGATGTGT ACGCAAAACA 264 0 

TTCATTCATT CAAGATGGAA TTAGTGCCCC AGACACAGAG GCAGGGGATA AATAGCAAAC 2700 

AAGGCTTGAT TCCTGCCTTC ATAGAGCTTA CTGTCTTGTA GGGGAAACAT GAGTAAATTC 2760 

AGCAGAGTAA GGGCTCTAAT TGGGTAAATG GGGGCTAGGC TGCCTGTGTC CTTGGGGTGG 2820 

TGGGAAGGCT GCTGATCTGG GGTGCCAGAA GACCTGAGTT TTGATGCAGG CTCTGTGACT 2880 

TTGAGCAGGT CGTTTCCAAC TTCTGAGCTT CCATTTCCCT AGCTGAAAAT GGGGGCTTGC 2940 

CATACTCGAT GCTGTACTCT ATGAGTCTTT GCAGCTCTGT CATCTTTTTT TCTTTTGGTC 3000 

ACTCAGAGAC TCCAGGATTG GGAGAACAAC CTGCATTCTG ATTTAAAGTG TGAATCTAAT 3060 

AATTTCAAAA AGAAAGGGAC TAAAAGGGAC AAACTTGTTT CTGTTTATTT TCCATCCTTC 3120 

TTTGGGGAAG TGTAACATTT GAAATCAAAT TCTCATTGGC TTAGCCAATG TGTAGACTTC 3180 

GAGGGGAAAT TCTCACTGCC CAGAGAAGTG ACTAAAAATG ACCATTACAG CCAAAAAGAG 3240 

AAGTTTTTTT TTTTTTAAAA TCTGTGCTCT ACAGATGGAT GAAGTGCTGC TGCACATGGA 3300 

CAGAGTGGAT CTGGACATTC TGCATGAGCC CAGGGATCCT GAGAATGGAT TGGCTGAGCA 3360 

TAGACAGGGT GACCTATCGA TGTTCACTGT GGTCCTGATC TATGTGGCCT CTTCCTAAGG 3420 
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GAAGATTTTT CTTAAGGTTG TTTCCTTTCT CAGCAGATAT TTGTGAAGAA 7VCTGTATCTG 3480 

TAGTCTCATT TTGTCCTTAT AATGACCCTG ATGGATGGGA GGTAGAGGGA TGATGATCAG 3540 

TAAGAGCTGG GAAAGCACCA GGAACTAGCA AGAGCAGGAC ACCTTTTCCA CCACTAGGTA 3600 

AATGGACCTA GTGACTGCTG GCACCGTGGG TGAGGGGACT GCCTGGCAGG AGCTGTGGCC 3660 

GTAGCTAGGG GATTACAGCT ACGGCCACAA CTCTGGCCCT GTACGGAGGG AGTGGGGGAA 3720 

ATAAAGAGTT CATATCACTC CCCTCTTTCC CTGGAGTCTC CTGCTGGTAC CTTGCATTGG 3780 

CTGAGTCTAA CTGGAAGCCA GAGGGCAAAG GAGGTACCCT TTCCAGCTCT GCAATTCTCT 3840 

TCAGACAGGG CTGGGATTTC TGGAGAGAAT TTGCAGAATC AGAAAGCAGA GCTTTCCAAT 3900 

CAATGCCAAG CAAGAGACTC TGCAGACTCT CATAGCCTTG GGACCTGAGA AACCAGGTAT 3960 

CCAGTGAGCA GTCACTTAAG CCTGTTCACC TGGCCCTCTC TTACTTTCTC TCCTATAGCA 4020 

GCAGCAAAGG AGCGATGGGC CGAAGGGACT TGCTGGGTAG AAGTGGACCC ACATTCTAAA 4080 

AAGGAATGGA AGAGAAACCT GATTTCTTTG ACTCGCCCTG TCCCTGAAGA TGAGGGGCAG 4140 

GCACAGACCA GCCCTCTCCA GAAAGACAAA TATATTCTTC CATTCATGGG AGGGGTAGTA 4200 

GAGACTAACA TTTGTTAAGT ATCTATTACA TGGGGGGTAT GGAGGTAGGC CCTTTGTGTG 4260 

TGTTGCCTCT TTTAATCCTT TGGTGATCAA CTCATGAAAA TAAACAGCTC CAGAGCCAGC 4320 

TGTCTTTGGA GGGTGTAGGC AGGCCCGGCT CTGGGAAACC TGGTGACACT GACCTAGTTT 4380 

GACTTCCAAA TCTTCTCTCT TCTTCGATTC TGGTGAGCCC CACTCTAGCC CCATAGTATG 4440 

TATGGCCAAG CACCCAGATA CTGCTTCCAT CAGGAGGAAA TAACATACCT GATGAATTTC 4500 

TTCACTCAAG GTGTTAGGAG CTTAATGTGT TTCCCCCGCC CCCCGCACCA AGAGAATTTG 4560 

TGTTTTCCAA GACAGTCAGA GAGTGGGTGG TGCTGAACTC AAAGGAGTGA ATCACTAATA 4620 

GTGGAATCCC AGGCATTCAG GGAGGTCCTA TTTCTGGGGT GGGTTCCTTC CTGACACTTC 4680 

ATTTTCTACA j^qqt^qqqj^q CCACCTATTG TCTCCAGAAA GGAGGCTGTC CCTGTGGGTG 4740 

TGGTGACGGT GGGAAAGGAG AGGCACCTGC AGGCTGAAGC CAAGATCACC TGATTTTCAA 4800 

AACCAAATCT GTCCCTACAA AGGAGAAGTG GCTTAAAAAT CCACACAGCC TCCCGAGTGG 4860 

AGGGAAGAAT TCCCTCTCCT CTCTGGAACA GGGTTCCCTT CACCCAGAAC ACGGTGCTGT 4920 

TGTTATGCAA TGTCCCTGTT GGCAAAGATA TTTGAGCCCC TTGTTTTCAG GTCTGTGTCA 4 980 

TTTCCAAGAA AGAGCTGTGG CCTTTGAGTA GGACTGGGCT CCTGAATAGG GTCCCTGGTG 5040 

CCAAATGAGG GAGCCAAGAA AAGGCAGAGA AGAGGAAAGT CCTGACTTTT ACATGAAGAT 5100 

GAGACAGCCA GCCCTGTGGC AGCCAGATGG CAGTCCTGTT GCTCTGTAGT GGCCTTGGGG 5160 

TCAGACTAGG GGCAGAGCTG GGCTGAAGGC AGGAAGGCCA GGACAAGACA GGTGAGAAGG 5220 

GCAAAGTCTC CTGTAACCTG GTGAGAAAAT GTGGGCTAAG CCATTCTCAT CTGGAGCTGA 5280 

AGGCTTGGTG GAGAATGGCC CTCAACATTC AAGTTCACAC CCATGGATTT ATAAAAGGCA 5340 

GGGCTGGGGG GAAAGGTTTT TCCCATTATA CTTAATAACA TTATCAACAA CAATAATCAC 5400 

TACTATCATT TATTGAGCAT TGACTCAAAA GACAGTCCTT TTATGAAAAT TATTTACTTA 5460 

AATCCTTACA AAGCTTCTAT TCATTCACCC AACACATATT TATTGAGTTC CTACTATGAG 5520 

CCAGGCATTA TTCTAGGTGC TTAATTTAGA TCAAGGGACA AGACAGACAA AATCCCTGTT 5580 

CTGGTGGCAG GGCTACTACA TGCAATTAAC AGCACACAAC TCTAGGGGGA GCCACATACA 5640 

TGGGCCACCT TATGAATGGT GTGCCCTGAG GTTAAGCATC CTGGCAGCCC CTTTCTGTGA 5700 

CATTTGCATT CTAGTGAAGG GAGTCTAATA CCAATGAAGT AGATGTCATT ATCCCCTGAC 5760 

TACAGTTTAG GAAACAGAGA CACATAGGAA TTAAGTAACT TGCTGAGTTT TTCAGCCAAA 5820 

AATGACTGAC CCATGATTTA TACTGAAGTC AGTCCTTGCA ATTCACCTGT GCCACGTACT 5880 

TGCCTTTCTC TCCCTGGTGG GCACAGGGAA GAGGGAGTAG CCAGGCTGGC CAGATGAGTG 5940 

CTGGGCTGGC TGGCCCAGTA GAGGCACCAT GTCCTGACTG GGTGGACAAA GACTGGGTAG 6000 

GAGGTAACAG AGAATCCCTT GGTGAGTCTA ACTTAGCTAT AAGAAGGCTT GCTGAGAGCA 6060 

GCTGCCTCCA TGCAGAGGGT GGGGTGACCG GCCTTTAATC CTTCCCAGCT GAGGATTTAG 6120 

TCAAAGAAGC TTGTCTCTGG GGATAGCCTA TGGTCTTGAA GGGCCTGAGT TAGCTATTAG 6180 

TTCACCCATT TATTTAACAT TCATTCATTA TTTTTAAAAA ATTTCCTAGC TATGTTTGGG 6240 

GGCAGAGAAG TGGGTCCAGA GACCTAGAGG TTTGCAAGGG TAGCTTCTAA ACTCCTTTGG 6300 

TTCAGAACAG AATAGAAAGT GTCCTCGGGT GACCTTGGGT CTGCTTCCCA AGCAAATTGA 6360 

GCATACGCAG CCAGAACAAA GACTGCACTC TACTCTAGTG AGCTCAGCCT GCTAGGCTTG 6420 

GATCTAGATT TTATAGCAAT AAGCTTGGAG TCTCACCTTT GGGTCAGACA GAGTACTACC 6480 

CCAGACATGA GGTAGGGAGA GCCTAGTCTA TATTCCTCTG CCTTTGTCCA AGCCTGCTTT 6540 

GTCCTTCCTC TTGACGAGGA ATAAAGATGG CTTCTGGGTG TGCATCCCCT TCCTTCTTCC 6600 

ACCTGCAGAT GTACCTGTTT GTGTGCAGTG GGCTTCTGAG TCCTGGGCAG GGATGCCAGA 6660 

GACCGCAAGC CAGATGCTTG GGATGCCAAT CCTTGGGACT TTGAGGAGAA AGAGAGGTTC 6720 

TGAGGGGCAT CTGTCTATGG CACAGAGTCA AATGGAACAC ATGGAAGTCC CTTAGAAGGC 6780 
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TGGTATCTAA GTGTTGGCCA CACAATGTCC GTTCTTCCTC CATTATTTGA ATTTCTCCTT 684 0 

CTCTATCCTT CTATCTTTCT TGGCACCTTG AGCCAGGTCT GGGGTGAGAG AAGGGATGGT 6900 

GTAGGTGAAT TAGTGGTAGT TATTGGAGGA AGGCAATAAA CCCAGAAAAA GTGTCACGTG 6960 

ACTTCTTTCT TGGGCCCAGT GTGACGCTTC TAGTTAGGCT AACGTGGGTC TTGGGACTGT 7020 

TCCTGAGATT TTGTGGAAAA CTCTTTGTAT TTGTGCTGGT AACAGAAGGA AACCAGAGTT 7080 

AGGGCTGGTG GGATGAAGCA GTGGGAACAC TGATTTCTCC TTTTTTTCAG ATTCAGGGAT 7140 

TTCTGTCAGA GACATCCGTG GGGGAGGGAT GGGATTGGGA GTGAGGAGAA TCCCTTTCCT 7200 

CTCCTCTCAC CATCTGGTGG TCCCCGTGCC CACGCACCAG CTCGTTGGAT GGACATTTTG 7260 

ATTCCCTTAA GATGTACATT CTTCAAATCA TTGTTTGTCA TTAGCTCCCT GGAGAAAATG 7320 

GAGGGGCTGA GATATTAGTG AGAAAACATA AAGTTAATTG GGTGATGGAG ACTGGGAGAA 7380 

GGGGAATGTT AGAAGAAAGT GAGCGAGGTC TGCTAAAAGT GAACTTTATC TTCTTCTCAA 7440 

TTTTGCCTAA GACTCGTGTT GCCTGGGCAG TCTCTTTTTG GAAGAGAAAT TTTCATGACA 7500 

GTTTGGGCCA GAGATGGCAA ATAAATGCCT GACATGGTTG CTGCCAGCCC CTGTCTCCCG 7560 

ACACGTTCAC AAGGGTGCAC ACCACTTCTC CTCTCTGTGA CCATAGACTC AGACCCATTG 7620 

CAATCCAGCA TCCTGCATGG CCCCATTGGT CAGAGTTGAC ATTTGCAATG AAGCTGCTTC 7680 

CCTATGCCTG GTTAGGCCTT TTGCTATGAA TTCTCTGGAG TTAACTATTT CCAAGGGGCT 7740 

CCAACTTATT CTTGTGATTT CCACGGGATT TGGAGCCCCA GAAGACAATC CCATGTGGAT 7800 

TCACAAAATG CCCTCTAAAT TTGATGGCTG TCAGTGCATA CTAAGTATGA CTGACTCACT 7860 

GGTATCTGTT TCCTCCGCTG ACACAGCTGG TTCTTAGGCT CGGCAGGAGT TTGGGCTGAG 7920 

ACCTCTCATT GCTCTATATT CCCTCTGTTA CTAATGAGGT GTTGTTCCTT AATTACTAGG 7980 

TGCTGGATAC TAGAATTGCT TTTCTTTGTT TCAGGGGATT TAGCAAAGGG CTTATAAATA 8040 

TTTCTTGTGT CTGGCATGAA CTACCTGATT TTTTTATTCT TCAGGTCACT GAGCTGGCAA 8100 

TAAAGGCAAC TCAAAGTTAG CTGGGAATCA GAATGAAGGG GGACTAGGAA AAGTGATGCC 8160 

TAGAACACCA ACAGGTGTGG GATCATCTTC ATTGTACCTT TCAGAGCCTA AGATATAAGT 8220 

CCTCTGGATA CTCTCTGCTT GTTTATTTAA AGGAAAAAAT AATCAGAATG TGGGAGAAAT 8280 

GGGTGCTTTG GGTAATTTCA TATTCTAATT GATGAACGTG TATGAAATTA TAATATTAAA 8340 

CCACTACTAG CCCTTGCCGT AAAAAACTAT TCCAAAATAG CTGAGTCTAA GTTTCCTGCC 8400 

TCAGTGTGTC CCACCTCTTG CGCTTGAGTC CTTAATGATC CAGAGTTTCA AGTCCCCAGT 8460 

GCCCTAATCT TGAAAAGCAG AAACTTTAGA AGTTTGCTGA AGTTTATTAG TTGGCTATAC 8520 

GATCCATCAA GAAATTGACT TTTTTGGATT AAATTCAAGA TAGTTTTTAA AAAATCAGAA 8580 

GTTTCTTTAT CATGAAAGCT AAAAAAATAA TTGAAGGTAG AGGCTAGTTG GAATCCCAGT 8640 

TAATAGATGG ATTTCTTCCT TCTTGAAGAA ACTTGTGTCC AAGGGCAAAC TGAATCCTGG 8700 

TGGTCTATGC TGGCCACATT CAGCAAAAAA TGGCCCGAGG TTTTGATGGT TATCATTCTC 8760 

AAAACTGTTC CTGCCAACAC ACTCTGATCC CAGGAGGTTA CCTGACCTTT ATAAGGCTCA 8820 

GTTTCCTCCC CTGTAAAATG GGCAGGGTAA TCAAGCTAGG CAAAATATTT AACCTAAGTG 8880 

AGGAAATTGT GCTATTAGTG CCCTGAAAAA CATGTAGAAA GACATTAGAC ATTATTTTAT 8940 

TTAATATCAT GTTGAACTTA GTTTTTAAAA AGAAGACCTA TTGGATTTTC CAAGAAGAAC 9000 

TAAACTGATT CCTTGTAGAC AGTTTAGAGA ATACAGAAAA TTAGAAATAG GAAAAAAGCA 9060 

AAACAAAACA AAAACCATCA AACAAAGTCT ACGCAAATAC AGTTTCTCTT AACTTTTGGT 9120 

TTATTTCCTT CTAGTCATTT TTTAGGTGCA TTTTTAAATT GTGGTAAAAT ATATGTAATG 9180 

TAGAATTTAC CATTGTAGCC ATTTTTAAGT GTAGAGTTCA GTGGCATTAA GTACATTTAT 9240 

ATTGCCGTGC AACCATCACC ACCATCTATC TCCAGATTTT ATAACCCCAG ACTGAAACTC 9300 

CATATCCATT AAATGATAAC TCCCCATTCC CCTCTCCCTA CCCTGGTGAC CACCATTTTA 9360 

CTTTCTGTTT TTATGAATTT GACTTTCTTG GCGCCTCTTA TAAGTGGGAT CATTTTTAGT 9420 

TGTTTTTATA ATCGGTTTCC TTCCTTTAAA AATATGAATG GAGCCTAATG AATATTGAAT 9480 

TTAGTGTACT GGTTTCTTTG AACATTTCAG CATCATAAAC ATGTTTTTGT ATTCTACATT 9540 

CTTCTTGTAT TGCTATATTC TCTATAGGAA TTTTTTTTTT TTTTTTGACA GAGTCTCACT 9600 

CTGTTGCCCA GGCTGGAGTG CAGTGGCACA ATTTCAGCTC ACTGCAACCT CCGCCTACTG 9660 

GGTTCAAATG ATTCTCCTGC CTCAGCCTCC CAAGTAGCTG GGACCAGAGG TGCATGCCAC 9720 

CATGCCTGGC TAATTTTTGC ATTTTTAGTA GAGATGGGGT TTCATCATGT TGGCCAGGCT 9780 

GGTCTTGAAC TCCTGACCTC AGGTGATCCG CCCACCTTGG CCTCCCAAAG TGCTGGAATT 9840 

ACAGGTGTGA GCCATTGGCC CCAGCCTTGA ACATCATTTT TAATGGCTGA AGATTATAGA 9900 

ATCCAGTGGG TGTGCCATCC ATTATTAGTA TTCTGTTGTT TCCAAATATT TGCTGTTTTA 9960 

AACAGTGTTG TGAAAACATA TTTTTGTGTT GAACTTTTAT CATATTGAGA GGCACTTCCT 10020 

CTGTGCAGAA TCAAGAAATT AATTACCGGT TTATAAGGAA TGTGAACCTT TCAGGCTCAT 10080 

AATCTGTATT ACCAAATGGT TAGGAAAAAA ATGTTCAGAA GGTGCCATTC ACAGATGGAG 10140 
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TGGGCTTCCA 
TGATTCAAGT 
ATGTAGGAGA 
AACAGCATGT 
CTTGGAGGGT 
GGTTCCCCTA 
TGAATCTAGG 
ACATCACCCT 
GCTGCCTCTT 
TCAGTGCTTT 
TCCAAGAAGA 
TGTGTGCTTT 
ATCTTGGGGT 
GATTGAAGAG 
CTGGCTCTTG 
ATTTCTTGCT 
GCACTGTTTA 
TCACCATTTG 
CTTGAACGTG 
TTCTAACTCA 
GTGACTCACG 
CAGGAGTTCG 
AATTAGCCAG 
GGCAGGAGAA 
GTGCCATTGC 
AAAAAAAAAA 
CTATCATTGC 
TCAATGATAC 
TTTTTCAATC 
AACACAATAG 
ACTATAAATT 
GCTCTGACAG 
ATATTAAACT 
GTACCCTAAA 
TTCTGGTTCC 
TGACTTCAGT 
TACTTTTTAA 
CTGTATTTCT 
CGGATACCAT 
ATAAACTTAG 
AGACTTTAAT 
GTCCAGAGGT 
GACACGTCAG 
TTTTTGAGAC 
GCACCCTCCA 
GGACTACAGG 
GAGATGGAGT 
CCCACCTTGG 
CTGTTTTGGC 
CAATATAAAT 
ATATTATAAA 
TATCTTCATC 
GGAGGCTGTA 
CCTGAGTCCA 
CCTCTCTGAA 
TAAGGTTTCT 



CCAGGGGCTG 
ATTAGATATC 
GGCATAGGGG 
GCAGATCAAG 
TGCTAATTGA 
CCTGGGGGAG 
ACTCTTGCCA 
GGCTCATCCA 
CTCCCTTCCT 
CCTGGTAGCA 
CGTCATAACC 
GGCTGCCTGG 
GTAATTGGAA 
TGCTTGGAGG 
TTTCTTGGAG 
TTTTCAGTAG 
GTGATGATGT 
TTGGTTTCTT 
TCTGGGGTCC 
GATTCTAAGT 
CCTGTAATCC 
AGACCAGCCT 
ATGTGGTGGC 
TCGCTTGAAC 
ACTCTAGCCT 
AGAACCACAG 
TACCCACACC 
ATGAGATTTG 
AGATCCTCTT 
CAAATGATGT 
CTGTGAAATC 
TGCTCAAGAG 
ATTTATTATG 
TATCTATCTC 
CATCTGCTCT 
ATTGCACGAT 
ATGAAGAACA 
TTAATTTAAA 
AATAATAGGA 
AAGGGCCATT 
TCTGCAGAAC 
AATCTCAGAT 
TTTTTTCCAG 
AGAGTCTCTG 
CCTCTCAGAT 
CACCTGCCAC 
TTTGCCATGT 
CCTCCCAAAG 
TGCACACTTC 
TATTTACTTA 
ACAGAAACTA 
TTCCCCTCAG 
GGAGGCAATA 
CATCCCAGCT 
TCTATCTTCT 
TACACTGTGC 



TGAAGCTCTA 
TAGGAAGGGT 
TGCTGATCTC 
CACTGTTCTT 
GATTATGGGG 
AGTTGACACT 
CTGCACAGAC 
TAACTCTCTT 
AGGAAAGCCC 
CCACCTGACA 
ACAAGAGATC 
CCAGAAAGCT 
TTGAGCTCTT 
CTGAACTCTG 
CGGTGGTATG 
CTCTGGGCTG 
AGGTGAATTG 
TTTCTTATGG 
TCCAAACAGC 
CAATGGTCTC 
CAGGACTTTG 
GGCCAACACA 
ATGTGCCTGT 
ACGGGAGGTG 
GGGCAACAGA 
GAGGGAGAGA 
AACAATATTA 
CTTCCTTCCT 
TTCCCCCTAT 
ATGCACATTT 
CACATTAGAT 
AAAAAAAAAT 
GAACTTAAAA 
CAAACCTCAA 
CTGACTTCAG 
GTATGTCCTA 
ATTAGGTGCA 
AATGTATCAG 
TCATTCTATT 
AAGAAAGTTA 
ATCATACCAG 
CCACTGTATA 
ACCAAGGGTC 
TTGCCTGTGC 
TCAAGTGATT 
CATGCCTGGA 
TGGCCAGGCT 
CTGGGATTAC 
TATCCGTAGA 
TAAATTACAT 
AAAGTATGAA 
TGGATCCTCC 
TTATATCCCA 
CCACCACTCC 
CACCTGTAAT 
CTGTGGTAAG 



ATCTCAAAGG 
GGGAAGGGCA 
TTCATAAGGG 
TCCTTTAGAG 
AATCTAAAGC 
AGTCAAACCT 
TCCAGCTGGA 
TTGTTTCATC 
ATGTCACAAT 
AACACTGCTC 
TGAATCAGCC 
GGGACTGTAT 
AGTGTGGAAA 
GAAGGACTCT 
GCCCACAGGT 
TCATCGAGCC 
CTCCACAGTT 
GAACTCTGGT 
TTCGTGTCCC 
AGCCTTTAGA 
GGAGGCCTAC 
GTGAAACCCC 
AATCCCGGCT 
GAGGTTGCAG 
GTGAGACTCC 
TCATATATGA 
GTGGAAAAAT 
TAATTTTTCC 
TAATTGTATT 
AACACATTTC 
CATGCCTCTC 
CAAGTTGTGA 
CATACACAGA 
CAGTGATCAA 
TTTATTTTGA 
AACGTAAGCA 
TTTTCATAAG 
ACAACTAATC 
ATACATAGAC 
TGTCATAATT 
GATTCACAAT 
TAATTTTCCA 
TCCTAGCTTT 
TGGAGTACAC 
CTTGTGTCTC 
TTTTTTTTTT 
GGTCTTGACC 
AGGCATGAGC 
TAATTAAGCA 
TATGTACTCT 
GTGAGAATTA 
TGTACATACT 
GTGAGGTGTG 
TTAGTTCTGT 
ATGAGGGCAT 
CATCAATACA 



ATGTTGACTA 
GAGAAGCTTC 
GTGACGGGAA 
TGTGTGTTTA 
CACACCCCAA 
CTCCCATCTC 
CCCAGGGACT 
TCAAACATCA 
AAGCGCGCCT 
GCGGCTGCCT 
CATTTTTTCC 
TTACCTATCA 
TTCTTACTCA 
TCCCTGAGGC 
GGGTG TTTCC 
CACTGTTCCT 
TAATTCCAGT 
CTGCATCTCA 
TCTGAGTGCG 
ACCGCAGGAG 
GCGGGTGGAT 
ATCTCTACTA 
ACTCAGGAGG 
TGAGCCGAGA 
ATCTCAAAAA 
CCCCGTATGT 
GTCTTCAAAG 
CTGTACAGCT 
TATAGGATGA 
GTGAAGGCAG 
CTTTCTCAGT 
CAGTTTAAAA 
AGTTGGCAGA 
CCTGTGGTCA 
AGCATGTCTC 
TTCCCTTTAA 
GGTTTTAGAA 
CATGTTTACT 
TAGTGAGATC 
TTTGTCACTT 
TGTATACACT 
TTTGCCTAGC 
TTTTTTATTT 
TGGTGCGATC 
AGCCTCCTGA 
GTTTTTTTGT 
TCTTGATCTT 
CACTGTGCCC 
TGTACCCTTA 
ATCACACTGG 
AAAATGAATA 
CCAATTTGCA 
TGGGTTGTAA 
GACTTGGAAA 
TAACCCCTTA 
TTTTAGCCAA 



CTGGTAGGGC 
CAAAATTCCT 
TTTTCCTTGA 
TTTGGGGCGA 
ACCGCCCCTT 
TGAGATTTTG 
CCAGCTTCTC 
CTGAGAGATG 
GTGCTTCTCA 
TCAGCTGCTC 
CCTGTGGCAC 
TTTTGATACT 
GAACACAAAG 
CTCTTGGCAT 
TTTGGGAGCA 
TGTCTTCTCT 
GGTAGAGCAG 
CTGTGTTTCC 
GACACTCAGA 
GCCAGGCGCG 
CACCTAAGGT 
AAAATACAAA 
CTGAGGCAGA 
TTGTGAGATT 
AAAAAAAAAA 
GTGAAAAGTC 
GACATTCGAT 
ATATAATGAT 
GATTGATTCT 
GAAAGGGCAC 
TGGGAGGTGG 
AATATTTTAA 
ATAACATCAT 
GTTCTGCCTC 
AGACATCTTG 
AACATGTATC 
AGGGAAGAAA 
GTTTCTAACA 
AATTTGTCAG 
GCTGAAACCA 
GATTGTGTTT 
TATGGGGTTG 
TTATTTTTAT 
TCGGCTCACT 
GTTGTAGGTG 
ATCTTTAGTA 
AGAAGATCTG 
AGCCTCCTAG 
CTATTTTCCG 
TAAATTAAGT 
GCAATTCTAA 
GACCACTGGA 
AGCCGAACAG 
CATCACTTAA 
CAGGTTATTG 
TAATAACAGT 



10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
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AATGATAATA ACACATTCCT AGAGGGCTGG GATGGATCTA GATTTTTCTT CCCCTTTTAG 13560 

TGGAAGACCA CAGCATGATG CATGAATTTA CATTTCCTCA GACATTCTGG TGCTGATGAA 13620 

GGTAAAGATG GTGAGGCTGC GATGATGGTT TCAGGGATGG GTGTGTTGGG CGTGATGAAT 13680 

AGCATGATGC ATATTGTCAC TCATTTAGTT TATCTGCACT GATGATGATG CTGATTATAT 13740 

GATGACTGTT ACAGGGATGG TCACATTGTG GGTGATGAAT ATGACCAGAA AGGGAAGACT 13800 

TTCACAGTTC CTACCCGAAC TACAACATCG ATATTTTCAT TTGTCTTTCC TAGGAACTCT 13860 

TACCTTAATC ACCTGACCAA TATGCTGACG ACTAACATGT TGCGCCCTGC CTTTCTTCCG 13920 

GGCCTCTCTG CCTTGCTGAT CTGTTTTGCT GGTGTGCCCT CCACTGTGCT CTTGGGTCTT 13980 

TGTCTCTCGG TAAAGCCTAG TACTGTGGTT GCTGTACACA AAACCTGTAG ATGATTAAGA 14040 

TCTCTGTTCA CTGCAGGGCC ATTCATCTCC CAGCAACTAT TTTATCCTTA AGTCAAGAGA 14100 

CTTGCCTCTC AGCCCCTGGG GACCATGGAA AGAGTGCTAG AAACCTACAG AGTATGACCC 14160 

TTTGTAGCCT TATGCAAGAA GTGACCTGTG TCTTTCCTGT CATGAGAGAG GACAGACATT 14220 

GCAGGAATCA AACGCATAAC ACTAGTGCAA AACTGGGGAT AATGCCCAAA CCTGGTTAGG 14280 

CAGGGGCGCC TGGAACATGC TTGTCCAGGA AATCTTCCAC TCAGTTCTGC TGCCTCCATG 14340 

TCCCAGATGA TCACAGAAGC CTCCTGAGAA GGGTTGAATC CCCCGTCGCC TGGGGATCCC 14400 

AAGAAAGCTG CAGAGGAAAG ACTTTCTCTT CCAAGATCAG AACAAAGGAC GGTTAGCATT 14460 

GTGCCCAGTA GTGCCAAAAG GTAAGGTTGG GTTAAAATAA GAATTTGCCT TAAGCTCTTT 14520 

TCCCGGGGGC TTGTTTTTTT CATTAACCTT GTTGGCTGGA CTTTAGGGAA GTATGCACCA 14580 

TCTTCTCCAG AAGTGCTTCA GATTTTATAT TTTTAAGAAA TTCAAGAGTC TGAGTTAGGC 14640 

ACTTTAATGT AACCTCCCCA AAGCTTTTGT TCCAGGAATT GACTTGGGGA TTAATCTGTT 14700 

TAGCAAATTC TGACACAGAG GCATCTCATA ACCTTTTATT TTTTCTACAG ACCACATTGT 14760 

ATCTACCTGG GATGTTTTGA AAATGAACAG TGACACCTAA GAATGTATAC TTATCTCTTC 14820 

ATGCCAATTC TCCAAACTGG ATGTTGCCCA TGTCTCAAAA TTACTTGCCT CCAATTTTAG 14880 

GGCATAAAGT GTGAGATTCT GTAGCATGAG ATCATATGCT CTTAAAATAC TAAGTATATA 14940 

TAAATTATCC CTTAGCATCT TTAACATGCA ttcttttTTT GTAGAGACAG TATCTCTACA 15000 

AAAAAATCTC TCTGTATTGC TCAGGCTGGT CTTGAAATCC TGGGCTCAAG AGATCTTCCC 15060 

ATCTCGGCTT CCCAAAATGC TAGAATTACA GGCATGAGTC TCCACACCTG GCCTAACATG 15120 

AAATATTCTT TAACAGTATT CTTTAGGATA ATATATTATT CTATAGATTT GAAATAATTT 15180 

ATCAGTTCTA TACTTAATTA TAAATACTCT TGGGAATAAA ACATACTTAT CTAATAAGCA 15240 

AACAGTCGTG CTATTCCAAA CAATTTGGGA TTGCCTTTCC AAGCATTTTT TGGGGGTTTC 15300 

TTCAACTGAT TGAGAGACCC CCGGCCGGGG AAGAGAAAGA GAATTTGATT TGTGACACTG 15360 

ATGGAATGGA CTACAACCTT TTGGTGGTGA CTCTACTGGG GACTTGTCAC AGAGCTTATT 15420 

TTCTAAACAG ATGTGAAAAA TGAAAGTCAG GCTGCTGTCT GGTTGGTAAG ATAAAGCTTT 15480 

CATTAATACT TGGCAGCATT ATTTTAGCTA AAGTGTCAGA TCAAACGCCC ACATTATCAC 15540 

CTCCCCTTCC TGATTCCAAC CGCCCATGAT AGAAAAGAAA TAAAAGACTA GGAATAGGTC 15600 

CATCAACTGG TGAATGGCTA AACAAAATGA GGTATATACA TACAATAGAT GGTTATTGAA 15660 

TCACAGTAGG GAATGAAGTA CTGATACATG CTACAATATA GATGATCGTC ATAAACATCA 15720 

TGCTACGTGA AAGAGGCCAG ATGCAAAAAT GTCACATATT ATATGATTCT ACTTATTTGA 15780 

AAAACTCAAA GTAGGCAAAT CCATAGAGAC AGAAAGCAGA CTGGTAGTTT CCCAGTGCTG 15840 

GGGAGAAGGC AGACAGGGAA GTGACTGCTT AATGAGTATG AAGTTTCCTT TTGGGATGAT 15900 

GAAAATGTCT TGGGACTTAG ATAGAAGTGA TGGTTGCACA ACACTGTGAA TGTAGTAATT 15960 

GCCATGGAGA TGTACACCTC AAAATGGCTA AAATGAATTC TATGTTATGT GAATTTTACC 16020 

TGAATTTAAA GAAGAGTAGA AACAAACACC AAGAAAAAGG GAGGAAAGGA GGCATTATTG 16080 

AACAAGACAT TTCAACAAGT TTTGGAATAT GGAAAATATA CGGAGAAGTG GCAACTGACT 16140 

TACCAGAGTG GCAGAAGAAA TAGTCTATGT GAGTGTGGGG AATGGGGTGG ATGTGGAACC 16200 

AGTGAGAAAT AAGCCGCTTT ACTGGGAAGA ACTACAGAAA GACTGAGGCT TGGACGCAGC 16260 

TTGTGCTACT ACAGGTAGCA GTAAACAGGG GGATTTGTTG AACTTCAGAA TATAGAGAAT 16320 

TTTGATGTAA GAGGTTTTTT TTTTCTCGTC TCAAACCAGG AGACTTTTTT TGTTCTCTAG 16380 

GTGAGGGAGA TCTAGAGACA GCCAAGTACA GGGTGCAGTA TCATCTAGAA AATAAAGAAG 16440 

AGGTTTGAGT CTGCAGGTGA GACTCCTGCT CTCTTCCTGG AATGCTGGCA GCCAGGCTTA 16500 

GATCAGCCTC TCTGCCCTGC TCCAGGCAGA AGATGGAAAG ATCCCTTTCT GGAGAAACTG 16560 

ACTCATCCAA GAGATAACAG CTCATATTCT TACTTTTTAG AGCTCTCCAG TAAAATGCAG 16620 

CTCAACACTT GATCAGTTTC CAGCGATGAC CCCTGATCAG GCCCTCACTA CGAACCTCTG 16680 

GGTTTTAATT GGTTATTTAG TATCTCAATT TTAAAGATCA AAGACAGGAT CGCTTTTGAG 16740 

GAAACTTCCA ACTTTAATGA AAGAATTTAA AAAAAAAAAG GAAAAAAAAC CTGATAGTGT 16800 

AAAGAGCAGA GAAATGGCAG GGAAATGAAA ATTAAGTTAA AAAACAGAAA CTTTTATATA 16860 
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ATTCTAATCC TTTGCAGAGA TAAAAAAATA CATTGCATAC CTAAAACAAG TACAAGTTGC 16920 

CATGGAAACA GATTCATTAG TGAAGAGGAA AGAGATCTTG GAAATTAAAG ACATAAAAGA 16980 

CAAAATAAAA ATTAAAAAAA TTAAACAGAA TTTAGAACAT AATGTTGAAA TGAGAGAACT 17040 

TTAGATCTCA AAAACAACAG AGAATCAACC CAGGAGATTG TGTGTGACTA AAGAAGTCTC 17100 

AGAAAGAGAA TAGAGGAAAG GAAGGAATAT TATAAGAAAA GTTTCAAGAA TAAAAGGTCA 17160 

TGGGCCTCCA GACTGATAAA AATCCATCTT GTACCCAGAA AAAATTGACT TTTCAAGAAC 17220 

TGAATCAGAA CCTATCCTGT GAAATGTTAG GACAAGTAGA TCCTAAAATC TTCCAGAGGG 17280 

AATCCATTCA AAGGCCTTGA ATGGCATTAG ACTTCTCCAT ATCAATACTG GATGGTGAAA 17340 

GAAAAAGAGC AATACCTTAA ACTTGCTAAA AGAAAATGAT TTTTAACTAG AATTCAATTT 17400 

CCATCTCAAT TAAAAAACCC ACTGTAAAGA AAAAATTCAA ATCTTCTCAG GCATATAATA 17460 

ACTCTAAAAT TCTACCTCCT GTGCACCTAA TTTTGGCAAG TATCTCAGGA AGATACACTT 17520 

TGCTAGAACA AGGACATAGT TTAAGAAAGT GGAAGAAATC AGATCTGGGA ATCAGGGGAT 17580 

CACATGATAC AGAGGCACAG CCAGAGGGAT CCCAGGGAGA GCATGTCCAG TGTGACAAGG 17640 

AGTGGACAGC TTCAGAAGGG ACAGCACCAG GGGAAAAAAC AAAATGAATA TCTGATTGGC 17700 

ATAAACATTT GGAAAGTAGT ATTAAAAATG TGTGTAACAG GTGTGTTGTT ACATTTGCCA 17760 

AAAAAGAGCA AAAGGGAAAA AAAACCCCAA GCAGATGAAA AGTAAAGAAG GCAATGGTTA 17820 

ACTACTGGAA AAACAAAAAA CAATATTCAA GAAAGGAAAC GAAATCATGG TATACTTCTT 17880 

GACTAATGGG TGAAAAATGA AGATGTACAT AGTTATTAAA ATGCAAACAT TGATTATTGA 17940 

GTTAACCCAA AGTTGTGACA TTTGGAAGCA CGGGTAGGCA CAGTGGGGTG TAAGAGACCT 18000. 

AAATCCTCAC TTACCGTAAT GTTTAAAAAA TTGCCATGTC AAAGAATAGC AGCATATCAT 18060 

ATTATTTAGA AATATGGATG CAAATGCCAG AAGAAAAATT AAAGGAAGTG AAAAATGTTT 18120 

TCCTCTAGGA ATAGGACAGG GGACGTAATA GGGAACAGAT ATTCTGCATT ATCTCAATTA 18180 

ATTCTCACAA CTGTGACTGA AGCTCTTTTG CTCTCCTTGT TTTGCAGATG AGCAAACTCA 18240 

CAGAGGGATG CAACTTGCCT AGGATCGTAT AGCCAGCAGC TCATGAGTGT GGAATGGGGA 18300 

TTCAAATAAG GTCTAGGAGA CTCCAAAATC CATGTGCTTA ACCATGAAGT TTTACTACCC 18360 

CTTCTCTGCT TCTTCATTAA GTATTTTTAG TGCCTAATTG CCCATGCTCT CTGCCAGGTG 18420 

CAGTAAAGGA GGATTACACA GGTGCAATAT GAGCCATGAC TCTTGTTGAA ATCAGCACGT 18480 

CAAAAATAAG GCTAATGAGC ACGTGAAAAG ATGCTCAACA TCACTAATCA TTAGGGAAAT 18540 

GCAAAACTGC ATTAAAATAT CACCTCATAT ACATTAGGAT GGCTACTATG AAAAAAACCA 18600 

GAAAATAACA AATATTGGCA AGGATGTGGA ATAACTGGAA CACTCATGCA CTGTTGGTGG 18660 

GAATGTAAAA TGGTGCAGCT GCTGTGGAAA ACAGTATGAT GGCTCTTAAA AAAATTTTAA 18720 

AAAAATAGAT TTCTCATATA ATTCTGCAAT TCCATTCCTG GATATATACC CCAAAGAATG 18780 

GAGAAAACAG GATCCTGGAG AGATGTTTGT ATACCCATGT TCATAGCAGC ATTATTCACA 18840 

ATAGCTAACA TCTGGCAGAA CCCAATGAAT GAGTGGATAA ACAAAATGTA GTATATACAC 18900 

ACAATGGGAT ATTAGTCTTA AAAAGGAAGG AAATTCTGAC ACATGCCACA ACATGGAGGT 18960 

GCCTTGAGGA CATTATGCTA AGTGAAATAA AGCCAGTCAC AAAAGGACAA ATATTATATG 19020 

ATTCCATTTA TATAAGCTAC TTAGAGTGGT CAAATTCATA GAGACAGAAA GTAGAATGGT 19080 

GGTTGCCGGG GATGTAAAGG TGGGCATTTC TCAAAAAACT GAGAAATACA GAAAAATAAA 19140 

AATCACTCAC TGTTTGCCAC ACTTCTACCC TGGTTCTTTT TAAATCTATT TTTCTTACTC 19200 

AAAGAAATAC ATGTTTATAG TTTAAACATT CAAATAGTAC TACAGGTTCG TAATAAACAA 19260 

GAGCGGTCCA ACTCCCCTCC TCCTAGCCCT GTGCTCCAGT CCTTTCAGAT GTTGTTTCTG 19320 

GTCTTTGTAT TTCTCAATAA CATGCCTAAA TGTATTTTCT GGCTCCTTGT ATTGTTTATT 19380 

TATTATTTGT TGAGTTTATT GCTATGAAAA ATAGAGATTA GATCACTTAC AGGGTCTTCC 19440 

TGACACCGTG CTCACCTTCC CCACCTATAT GTACAATTCA CCTTCCCTGT CCTCATGGAA 19500 

ATAATATTAC TCTTTTAGTT AAGTCACAGG TCAGTATTTA TGTTATGATT ATGTAAATAT 19560 

TGTTTATGTA ATGTGCTAGG GCTACTTTTT TTTTCTTTAA TTCCTTATCC TCCTTCACCC 19620 

TCACCACCCA ACCCCAATCT CATCCTGGAG TTCACAGTTA TCTCATTTTT CCTTTGCTTG 19680 

GTTTTCTAAA ATCTATCTCC TGGCTCTTTC TCCAACTCTT CTCTCAGTAA GATAGTTTCT 19740 

CAGCTCTACC TTTTCCCCTT GTTGACATTG CTCCAGAGCC CTTCAACCTG CTCAGGTGGC 19800 

TATTCTGCTT GGTCACTCAC TTGTCCTCCT AGGTTTTCTT ATCTCCATCA TCTTGGGGAT 19860 

TCTGGTCTCC AATTTCCTGT GTTAGACCAA CTGTGTCCTG GATCCCATAT CTTTCTGTCT 19920 

CTTAGTTTAT TTCTTTGCTT TGATTGAACA TACTACCTAT GACATTTCTG AGAAACAATG 19980 

AAAGAGAAAT GATTTTTTGA GTTGTGGGAT GAATATTAAA GTCACTACCC GGGAAGGATC 20040 

ATTGTGCCTC TATCTGTATG AGGGATTCCC CTTGCACTTC TCAACCATAG ACAGCTCTGT 20100 

TCTGTCTCTT GAGCTCTTGG TGAACCCATC CCCCAGGACA ACATTTCTAT GTGTCTTGGT 20160 

CTGGCACAAG GTGACTACCT ATTCCCAGCA AATGCCAATC AACACCTGTC TTAATAATAC 20220 
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CTTAGCTTCA ACACCCAAGG TTTAAGTTGC ATTAATCACT TAATAAAGAA ACCTTCACAA 20280 

ATGCTAATTA CTAACCTAGT CCTTAAACCA TACTCATTTA AAGAGGTGGC ATCTTAGAAG 20340 

TTACAGTGTT TATAGTCATT CAACAAACAT TTATTGTCAG CCATATAGAA GACACCATGC 20400 

AAGGGCTTTA CATGGGTTAT CCAATGTAGT CCTCATGAAG GTCCTGTGAA GTGGGAATTA 20460 

TTGCCATTTG TGAATGAGTT TCAGAGAGAT AAAACTTCTC CAGCCATTCA TTCAACACAT 20520 

TTACTGAGTA TCTACTATGT GCTAGAAAAT GAGGATACCG CAGGGGGCAG AGGCACATGT 20580 

CCCTGACCTC TTGGAGTTTC TAGTCTAGCC TAGTCTGTTT CCAAGGGTAA CAGATATTAA 20640 

ATAAATAATT TCACAAATAG TCTATTAAAT ACATTTGAGA CAAGTGTCAT GAAAAAGAAG 20700 

TACAAGATGC TATGGGAATG TATAAAGGCC ATAAGCTGTC CTAGTCTGGG GCTCAGAGGT 20760 

GGTTTTTCTG AAGCAGTGCA TTAAGTCTGC AGGATAAGGA AGAGTCAGCC AGATGAAATG 20820 

AAGTCTAAGG TTGGAGAGAG GGAGGGAACA GCATGAGCAA AGGCTCAGGG GCAGGAAGGG 20880 

GCTTTGCATA TACGAAGAAC TGAAAGGCCA ATGCGGCTGG AACAAAGAAT GGAATGGTGT 20940 

GGCATAAAGT GCAGCAGGGA CCGGGTCAGG GAGAAGACCA TAAAGCATTT GTGCACGCTG 21000 

TTAAAGAATC TGTATGCAAC CTTGGTGGAC GTGGGAGACA TGACTGCTGA ACTTGAAGCG 21060 

CATCCCTGGA GATGGGGATA AATGGAGGGA TGCGGGATGT GTGAAGCAAG AGGCTTGTTC 21120 

ATGGTCAGAA CCGGCATCTG AACCCAGCTC TCATGACAAG TCTGCTGCTC TTTTTGGTAC 21180 

ACAAAACCCG TTTCTTTCTC TGTGTGAGAA TGAACAAGGT GCCTGCACAT TTTTCTGTCC 21240 

CAGTGCAGTG TTTGAGGATG CTAAGTTACA CCCCAACAGC TGTGCAAAAT CTGTTTCTCT 21300 

CTTGTGTAGT GATGGAGGCT ATACATTGTG TTGTGAAAGG TGTCACTCAT TTGGGAAATT 21360 

AGAACAAAAC ATAGTCATTG CCTTTAACAG CACACAGCCT AATAGAGGCA ATAGGAATGT 21420 

AAACAGGGTC CCAAGCCAAA ACTTAACATG AGCAAGTTAT AGAATCATAT ACAATTCTTA 21480 

GGGTCATAAT TCTAGGGCTA CATGTTTTGA CTGTTTGACC ACACTATATG CAGCAGTATC 21540 

GTTAATGGTC CTGGATCTAG GCAGCATTTT CCGAAGTAGA CTTAAAATAA CATCACTCTT 21600 

AGACTGGTCT GATTCTCTGT TTTGGCTAGA AATTGTGTTC CTCAAGAATA ATAACACATT 21660 

TAAAATCATC CTTATTTTTT AAGTTCAGAT ATTCTGCTAA ATCATTGATC TCCATGAATT 21720 

CATTGGTCAA TGTTTTAAAA CTTTCTCACA AACGGGCTTA TTGGAAATGG AGGCAGAAAA 21780 

TAAGGTGTTC AATAATATGA CCACATGGTC TAAATTTCCT ACAATACGCT TAGTTTACAT 21840 

GTGCAACACC TTTGTCAGAC ATATACCCAA TTTTGGTTTG AAAATAGCAT TTACTTCCCA 21900 

GGAGTGGTGT GTAGGAACTT AAGGGTCCTA GTATGTATGT CTCTAGTGGA AACTTTGGGG 21960 

TTCAGTTTGA AAAGGCAGTG TATCTCATGT GGATCCCTGT GATTCTCAGG GATTCTATAC 22020 

TAGGCAGTCC CTTGTGGATG CCTGGGGAAG TCGGGCTGTG ATCCTTACAG ACCTTCTCTG 22080 

AGCTGCCATA CAGATGGGGC AGAGGGTGAA TGATGGAAAA AGAACAAATG TTGCTGATGG 22140 

TCCATGATTC GTCCGCAAAT ATTGTAAAAC CCTGTACTAC CTGGCTGATG CTTTAACAAA 22200 

ATAGCTTCAG GGACATTAAA AAAGTAGTGT TTCCTGGTGT GCTGGTAAAT ATTTATTGAT 22260 

ACAAAGATTG TGTAATCACA ATTTAAAATA TACAGTACTC TTGATTGTAA ATTCCTTATA 22320 

ACCAATTGAT CCCCACAGAA TGCTCTTGTT GACTTTTGTT TGAGGCTCTT GTATCTATAG 22380 

TGTATCCAAT CTATTATTGC AATTGATGGA CAAGTGCCAT TCTGATAAGA ATGTGGGCTG 22440 

AGATTTCCCT TTATGTTAAT GAGTAAGAAG AAAGGGAAAC AGCAGAGCTA GACACTGGGC 22500 

CTTCAATCGT TTGTTAACAA CACGAGCAAC CTTTTTGTTG AACTGGATAA TAGTTTTTGA 22560 

ATACTGGAAG AATATTTCCT CAGTCTTTTT CTGTTATTCA CCATGCATTG GCTACAGTCA 22620 

CATTTTAGAA TTTAACCTGC ATTATTAGCA TTTCTCCATC ACTTTTTATA AGTCTAGACT 22680 

GGGGATTATT AAACTGTGGT CTAGGGGCCA TATCTGGTCC CCTGACCTGT TTTCGTACAT 22740 

AAAGTTTTCT GGAACACAAC CATGTCCACT AGTTTTATAT ATTGTATATG GCTGCTTTTG 22800 

TATTACAATA GCAGAAGCAG AGTCGAGTAG GTTGGACAGA GATTTAATGG ACGCAAAGTC 22860 

AAAATTATTT AATATCTGGC CTTTTGCAAA ATAAGATTTA CCAAGCCTTG GTCTGGGTGG 22920 

TCAACAAAAC AATAAATCAA GCCTTGATCT GTAGTGTCTG CCAATTTCCA TGGTGTAAAT 22980 

ACTCCCATCA TGGCCAATTT CTATCTACCA ACATGACACA GCAAAACATA GAGTTGGGAA 23040 

GAGATGTGTA AAGTACACCG TTATAGAGTA TTCTCACTCT ATAGCTACAG TGGCTATAAA 23100 

TAACTTCCAG AGCATAGACA ATAGTAAAAT GTAGTCATAA TTAAGAACTG GTAAGTTTTG 23160 

AGTGTTTATT ACCTTTGTTT CTAAATACAA TTTATTTAAT TTTAAGTTTA TATTTTAATT 23220 

TCGAATAATG GCTGGGTTTA ACAAGTGGTT TGCAAAATCT CTGAGAACTT AACAATCAGT 23280 

TATCATGAGT TGGCACTATT GCTTTCCTTT GGTGCCCAGC TGTCTTCTTT TTTCAGCCAT 23340 

TTCCCTGTCT CCAGGAGATA ATCCTTTTTT TTCTTCTCAG CCTGTCTGCT TCCCAAAGTA 23400 

TCCTTTGTTC TTTTCATGGC CCTCTGGCTA CGCAGGGACC CCACTTTTTG CCAAACTAAT 23460 

CTTTTAAAAC ATATGTCCCA CAGAGTACCA TTCCCTTTCA TCTGCTTCCC ATCAATACTC 23520 

TTATTTCTAC AATAGGGTTG ATACCAAATG GCCAGCAACA ATTTGTAATA AGCTGTAAAT 23580 
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GATTAATGGC 
TAGGGTTTGA 
CCCCCAAAAG 
GCTCCTCATT 
TCCCCTCCAC 
GGACTTGGGG 
CCCCTGAATT 
TGTTCTCTCA 
TGGAACTGCC 
AAAAGCCTGG 
CCCTGGGGCT 
ATGCAGTAGG 
TGAAGGAAAA 
TTTGTTTAGG 
ACATGTTGAG 
AATGGTAGGA 
ATGTCAAGCC 
ACACTCTAGT 
GTGGCCAGGT 
ATTTACAGAT 
TTGACCTGAG 
GGATGACTTA 
CAGATGTTGA 
CTGGGTATCA 
GACCCCCCTT 
CCCTGATGCT 
CATTACAAAA 
CAGCCTCAAG 
TTGTAATATT 
TTTTTGGTAC 
GGGTGTCTCT 
TCCCCAAGGG 
AGAGTAGATT 
TGCTGTGACT 
AGTAATAATT 
ATGTAGTGAA 
TACGTGGCTC 
TATACTCAAG 
CCCTACTGTT 
TCCTGAGCTG 
TCTGGCTTCT 
AAGTGCCTCT 
TGCAGGATGA 
GTATGAGGTG 
TACATAAAGT 
TAGTTCCGTT 
CTTTCTGATT 
AGTTATTTCT 
ATTCTGGTAA 
' GATTTTCCGA 
TGCTGGAAAC 
CTCCAGTTGG 
TGGCGTGTCC 
GAGCCCTGCA 
CAGAACTAGG 
ACTCTTTCGC 



CTGGAAACAC 

ATGTCTGGCA 

GTTACTGTCT 

ACCCCTCAAA 

TTTCATTCCT 

AAAGGGATCA 

GTGTGGTCTT 

GTTCCTGGTT 

AATGGGAAGA 

AGGGCATGCC 

CACAGGACTC 

AGGCAGTTAC 

GGCGTTTGTG 

CCCTGGGTTA 

TGGGTGAGAC 

GATCAAGTTA 

CTTGGAAGGT 

TGCTGTACAA 

AGCGATGGAG 

GGATTGGCTC 

AAACTGGAAG 

GGGGTTGGAG 

GTAGACAGCT 

GTATATGAAT 

ATAGAAAGAT 

CAGAGGTGAG 

GGGCTATTTG 

GACCCAGGCT 

TGGAAAAATT 

GTCCTACAGT 

ACAGTGGTCA 

GAGAAGCCAC 

AAATATTAAG 

TCAGGAAAGT 

CCCACCATAG 

CTGCAAAGCA 

AATTTACACA 

TCACTGAACT 

AGCTCAAATA 

GAAACATGAA 

GATAGGCCTC 

GATCTCCCTC 

AAACGATCGT 

TGTATATGTT 

TGAGAAGATC 

ACTCAGTAAA 

GGAATTCTGT 

ATATTATGCC 

AAAGTCAAGT 

AGCTGTGGAG 

AGACAGGACC 

AATGATGTCT 

TCAGGAGGCT 

AAACTCTGAC 

AATTGCCAGA 

TCATAAAGGT 



TTGCATTTTA 

ACATTGCAGG 

TCTTCAGTGA 

ACTTTTTCCT 

TATTTAAACC 

AGAAATCATT 

GGCTGGCATC 

GGCTCTACTG 

TCACATGGTC 

CATCACAGCC 

CCTCACCATG 

ATCAGGCTCT 

TTTCAGGAAA 

CAGCATGGAA 

AGAAAGCAGC 

GAGAGGGGAA 

TTTGAGCAAG 

GGAGACCAGG 

GATGAGAAGT 

ATGAGAGGAA 

AATGGAATTT 

AAAACCAGGA 

GCACATATGA 

CATCTGTGTC 

TAGATCCAAA 

GACCCAGGAA 

GAAATAGTTC 

GGTGGACTGC 

TCTCCAAGTC 

CAAGCAAATA 

AGTTGGAAGA 

TGATAGCTTC 

TTTCCTCCCT 

TGCTTAACTT 

GGTGTTTATA 

ATATGCAAAT 

ATAATAGATT 

TACTATCTTC 

GCTCTTGCTG 

ACAGAAGTTT 

CTCCTTATCT 

TTCTGCCTCC 

GGTCCTGCCA 

GGGGTGTGGA 

TGAGTTCTAG 

GTAAAAGCAT 

GACTTCATCT 

ACCAGATGGC 

TTTGTTGCCA 

GAAAGAAAGA 

CTGTTTAGAG 

CCTGTGATTG 

TGGGGGTGGG 

TTATAGAGGG 

TGCACTCATA 

CAGAGACGCA 



AAAAAAGGAG 

TGTGAGGAAC 

CAAACAACCA 

TTTCAATCTT 

TCTCAATTGT 

CAGTTGTTTG 

TGAGCACACC 

TCTGGCACCA 

ATTGAGAAAC 

TTGCCGGGAG 

GGGCACAGTG 

GGATCGATGA 

GATGTATTGA 

TGGAATGAAA 

AATATGGTAA 

TGGGCTAGAT 

AGAGTGTTAT 

TCAGAGGCTA 

AGAAAATTCT 

AAGAGGGACT 

CCACTTACTA 

GTTTGGATAT 

GTTGGGAGTG 

CACATGGCAT 

AGAGTAGTGG 

AGGAGACACA 

AGGTGGTGAC 

TGGCTGAGTC 

AAATTTAAAT 

ACAATTGGAT 

ACAAAGAATG 

CTTGGAAGCA 

TCTTTCAAGT 

TTCCAAACCT 

AAGATTAAAT 

ATAAGAGGTG 

TTCACACTTT 

AGGATCCAAA 

GTTTAGAGAG 

CAGTTCCCTA 

TTGTAAACCC 

CATGATGTTG 

GCCTGAATTA 

GTGAGTTGTG 

TCTTGAAATT 

TGGATCTAAG 

TTTGTGGGTT 

AATGTTTCCT 
ACTTTTCCCC 

ACTCTCCTTC 
AAGATCTCTC 
CGTATGGTGG 
GGAGATGTGC 
GCATCAGATG 
GGGCAGCTAA 
AGAAAGTGAC 



TCTTGTTGAC 
GTCTTTGGAA 
ACCCAAGCGT 
TTTAGTTTTA 
AACTGAAGCA 
TGCTTATCTA 
TGGTGCATCA 
TTCGGCTGTT 
CGCACCCTGA 
TGTGAAAGGT 
TAAGAAGGTC 
TATCAAGGAA 
GCCTCATCCA 
CCCCTGTTCT 
AGAGGGGGGA 
CATGGAGCAA 
GTTCTGACTT 
TTGCAGTTGT 
GTGAAGGCAG 
CACGGATGAT 
TGATGGGAGA 
GGGCCTTAGA 
CAGAGGGAGA 
TTAAAGGCAT 
TCTGAGGACT 
GAGAATCCTC 
TGGGTGAAAA 
CTGTTGTGCC 
TAACATGAAT 
AGGGTAGCTG 
AGTGATTGAT 
CTTTGTACCT 
CCTAGTGCTG 
CTATTTCCTC 
AATTTTAAAT 
GAAATGACTA 
GCATAAATAA 
ATCCCCAAAC 
TTAATGCAAG 
ATCAATCCAT 
TGTAGCTGGT 
ATAAAAAGCA 
TTAAAGCATT 
GAGATGAGAG 
CACAAGCCAT 
CTTTAAGGAC 
AGAAACTCAT 
TAACCCCAAA 
CTCTGAACGT 
TGAACATCTC 
TTTTCTTCGT 
GAGGTGGGAG 
CCTAGCTGGT 
CCAAGTTTTA 
AATGGTCCTG 
ATAAAGTCCA 



CCAAAGGTTA 
TTCCTAGTTC 
GTACCCTGAT 
GCTCTTTATT 
GATGTTATAT 
GAACTGTCAG 
GCAGAATCAG 
TGTACTTATC 
AGAGATGGCT 
GGTGTGAAGA 
CACGGTGAAA 
CAACCCAGGC 
TGCTCCAGAC 
TTAGTTTCTT 
ACAGGGGAAG 
CCGGGGCAAG 
ACGTCTTGAA 
CCAGGTGAAG 
AGCTGACAGG 
GCCAAAGTTT 
GGTTGTGAAA 
TATTGCCATG 
GGCTGGGGTT 
GAGACCAGGT 
GGGCTTTAGG 
TTTGTCAGAG 
GCCCTTCGAA 
TCAGAGGATA 
GTCATATGGC 
CAGGAAGACT 
CTTTTGCTAC 
CACCTGCCCC 
CCATTGATAG 
ATTACTAATG 
ATGTTGAAGC 
TGCCTATAAT 
TGAGGGTTTT 
AGAAGGCATC 
CCCCACTGCC 
TCTTTCTTCC 
TGCTAGTTGA 
CGAGGGCACA 
TCAGTCCTAA 
ACAGCTGAAT 
CTCTATACAA 
CCTTCTAGTT 
CACTCTGTCC 
GAAAGTTTTC 
GCAAAAGAAT 
AGGTGGTTTA 
GGACTGGGAA 
ATGTTGGAAT 
GGGCCTGCAT 
CCAGACCATG 
GCAGAATCAG 
GCCCTTTTCT 
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TGTGCAGATG GGGAAATTGA GGCCTAGAGC 
CAAGTTACTT TGTATTTAAA CATTTCAAGT 
TCAGCTAGCG CACCTTGTTA AGCCTGTTGG 
ACACACCAGG GCATGCTGCA TGCGCTCAGT 
ACCTATCAAA CAGCAGACTT AGCTAGTTGG 
TAATCTGCAA AACAAGTCTC ATAGCAGGTT 
TTAACAACGA TATAATAACA ACAAACATTT 
GTTAAGCACT TAACATCACT ATATCATGCA 
TACTATTTCT GCTTTACAGC TGCAACAGAG 
TCACAGAGCT GGAAGGAGCA GAGCCAATAT 
GATCTTCTAG CTTCTATGCT GTGCTGCCTC 
TGATGCAGAA ACTTATCGGA GTTTCTTACC 
GTTTTGACGT GAAGGCTCAG TGATAGTGAG 
CAGCATCACA CAGGACATTC AGAATGTTGA 
AAAGCTGCCA GCACCCGTGT CCAAGAAACA 
CCCACTACCT GCAAAGGTAA AGAGGAACAG 
TAATGGAAGC TAGTGTGTTG AGAGTCTCCT 
ATGCACGATC TCATTTAGAC CTTGTGACAG 
GGGGCAAATG TCTGCAGTGC AGAAAGTCGT 
TCTTCATAAA TCAATGATCC TGTTTTACCT 
CTTGTGAAGA TGTAGACAAA TGACGGTCAT 
GAGATAAGGA TTTCCAGTCT GACAGACTGG 
CATGTGACCT TGGGCAAATT ACCTAACCTT 
AGGGGATAAT ATCATATATG TTCCAAGATT 
AAAGTACCTG GCCAGCAGTT TCTGGCACAT 
GATGAGTATA GGGGCTACTA ATGCCCATCC 
GAGGCACTTT TGAATATCTA AACCCATTTA 
CTGCCTATTG ACAGCTAATT TGCCTCATCC 
CCAGAAATCA GGTATTAAAT TATCAGGGCT 
ACCTGTGCCC AATAAATGTT TAAGAAATAA 
AGCCAAAATA AGTCTCTTTG CTTGCGCTTT 
CTCAGAGTCA AACGTGTGCC TAATAAACAT 
AGGAAGAGAG AGGGCTCCAG TGTCTGTCAC 
AGGGCCACTG GTGCCACATG TCCAGGTCTG 
ATGTATTTCT ATCTAAGAAA GAAGACTATA 
CAAGGGCATC TCCCTCTAGA AGTAGAGATT 
GGGATAGCAG AATGCCTGGA TGGTGTTCTA 
AACAAATCCC ACCTACATCC GCCTTCCTCG 
TCCTTTTGGG TTAGCAACCA AGAAAGAGTA 
ACTTTGCAGC GCCTCTTTTT TTTTTTTTTT 
CTGGAGTGCA GTGATGCGAT CTCAGCTCAC 
TCTCGTGCTT CAGCCTCCCG AGTAGCTGGG 
TTTTTTTTTG TATTTTTAGT GGAGACAGGG 
CTCATGATCC GTCCGCCTTG GCCTCCCAAA 
AAACATAAAC AAAAACTAAC AACTTTCTAG 
AAAGAGATCC ATATTCGTCA GAGAATAATT 
TAAATTCAGC CTGTAATCAC CCAGAGATAA 
TATTTCCTTT TATAACAAAA CTTTTTTTTC 
ATTTTTTCCT TTATGGTTAA TGATTCTTGT 
TAAAGAATTC TCTCATATTT TCTTCTAAAG 
TAATAACCCA CCTAGAATTG ATTTCTGTGT 
TTTCTCTCAA ATGAATATTC AGTTGGACCA 
TCACTTTTTG TTGAAAAATC AATTGTTCAT 
GGAGTCTATT GGTCTGTCTA TAAGTTGAAC 
TAACAAAGAA CCCCAACATC TCACTGATGA 
GCAGTTCACT TCTGATGCAG GAGATGCATC 
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AGGTCAGCTG TCCTGATTCT ATCTCCTTGC 27000 

AGACTTTTCA ATCATCTCAT CTTGCTGTGT 27060 

CCTCCGGGCC TGCCAAGCCC CTGCATCTAT 27120 

GAGACTTCAA CAGCTGACTG ATTCGTTCAA 27180 

GGAGAAAAGT CATTTAAAGT AATTGCTTAT 27240 

TTTATTTTAT TTTATTTTAT TTTTTTGCTT 27300 

GTTTAGTGTT TCCTGTGGAC CAGGCTCTGT 27360 

CTTTTGCTAA TAAAGCTGTG AAATAGTTAT 27420 

ACTCAGAGAG GTTAGGTAAC TTGCCCCAGG 27480 

TCACACCCTG ATTTGCGTAA TTCCAGATTT 27540 

TTCATGACAG TTTTTCTCAT GTACAGGATC 27600 

GGAGCACCAG TCACCTCTCA TCATTTTCCT 27660 

CAGGCTCAGG GTCTACAGAG TTGGTGATAT 27720 

CTCCAGGGAT GTTGAGAGAT ACTCCTGCAC 27780 

CTCAGAATCT AGGTCTCCTT GTATATTTTC 27840 

GCAGTGCTGG GACCGAGGGA GCGACAGTCC 27900 

CTGTGTCATG CTCTGAGCGA CATGTTTTAT 27960 

CATGTTGTAG CAAGGACCCC ATCATCACAG 28020 

CCTGAAGAAA TGGATGTCAG ATAAAAACAG 28080 

CAAAAGTGCA TGAAATGGAA ATGGAAATAT 28140 

TGCCCAGAGC AGTAGTTACT GTCAGAAAAA 28200 

ATTCCTGGCT CAACACCACC CCCTTCTAAC 28260 

TTCTGAGTCT CAATTTCCTC ATCTTCCAAA 28320 

GCTGTGAGTA TTAAATGAGA TGATGTATGT 28380 

AGTAAGTATT CAATAAAGAC TAATGGTGGA 28440 

TTACTCCAGA GACTTCTTTC TGACCATCAT 28500 

AAGCCCACTT TTCTCTATGG CTGGCCATTT 28560 

TACAGGACAC CTTCCATGTT TCCCCAGACT 28620 

TCAGGAGCCA TGGTCTATGA TGAGTTTACT 28680 

ATAAGAGCCA ATATAACTAT AAAGACCAAG 28740 

AGATCTTAAG AGTCCTTTAT ATTCAAGCTG 28800 

TCTACAAAGG TCCTGGCGTG GTGTGACCAA 28860 

TGGGAGACCA GATGGACAGC CACGTGGGGC 28920 

TTAAGCCCTA TGAAAGACAC TTGAGTCAAA 28980 

AATGGAAAAG GGAGAGGGGA GAAGACCTCT 29040 

GTGAATCTGC AGCAGAAAGG TTTTAAACAA 29100 

GTGCCTGAAT GGAAAAAGGC CACAATGACC 29160 

CTGCCTGAAA TCCCACCATT AGGATTTTTT 29220 

AAGTCTGGAA GACTCTTATT CCACATCTTC 29280 

TTGAGATAGA GTCTTGCTCT GTCACCCAGG 29340 

TGCAAGCTCT GCTTCCTGGG TTCACATCAT 29400 

ACTACAGGCA CCTGCCACCA CTCCCAGCTA 29460 

TTTCACCGTG TTAGCCAGGA TGGTCTTGAC 29520 

GTGCTGGGAT TACCACCTCT TCTTAATTAC 29580 

TTTTTTCTTT TTCTTTTTTT TTAAATTACA 29640 

GGAAAAAAGA GATAAGCAAA ATCAGAAAAA 29700 

CAATTATTAA AATTTAGGTA TTCACTTTGT 29760 

TTGTGAAATT TAATAGAATA CAATTGAACT 29820 

TTCTTATTTA GGAAATCATT TCCTGAGTCA -29880 

CTTTATACAG TTTTGCCCTT CAAATAAGGT 29940 

ATGGCATGCA GTAGAGACAA GTTCTACTTT 30000 

AGGCTGTCCT TTCTCCACTA CTTTGCAGTT 30060 

ACATGGGTAG ATCTCTTTCT GGGCACTCTT 30120 

AGGATCAGAC AGGCTGTGCT TTGTTTCAGG 30180 

ATACACTAAA GTCATTTTTG TTTTCCATTG 30240 

AGGGCAATCG CCCTTTGCAA GGTGAAGTGT 30300 
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CTGCACCATT GGAAGTACTC TCCATCCAGG GGAGAGAGAC TGGAAATGGT CCATGAGGTT 30360 

TTCATCGACC CAGAATGAAA GCATCACCCA TCATTCCCTT CTTGTTACAG CCTATTTGTC 30420 

AGAACCAGTC AGAGTTCCAC CCACCTGCAA AAGGTTGTGA CGTGCTGTTT GCGATTTGCC 30480 

TGGAAGGAGG GAATACCCAG ATACAGGAAA ATGCTAGTGA CGTGCACTTC CATCTAACTA 30540 

TCTTTGAATG AAAATGACAG TCTTAATTAC TGCAGTAAGA TAAGCAGACT CTATACCTGG 30600 

TAGAGCAAGT CCTCTTACCC CATTTCTTCT TCAAGAAGGT CTTGGCTAGT TTGGAACCTT 30660 

GGCAATCCCA TATAAACTTT AGAAAATGCT AGTTAAGTTC TTTAAAAATC CTGCTGAGAC 30720 

TTTTATTGAA TCCATAGCTT CATTTAGAGA GAGCTGACAT TTAAATTAGG GAGTGCTCCA 30780 

AGCCACTAAC ATAGAATTTC TCTCTTTTAC TCCAGGTCTT CTTTAATTTC TCTCGAGTGT 30840 

TTTGTAATGT TTTGCGCAAA GTTCTTGCAC ATCTTTTGAT AGATTTCCCC CTAGGTTTTG 30900 

GATATTTTTA AGATGCTAGT GTAAATGTTA TTGCTATATA TTTTTCATTT TACAAATATA 30960 

TGTGTTTAGT ATATAGAAAT TTAATTCATG TTTCTGTATT GACTTTATTG AGTAACCTTA 31020 

TGAAACTTTC TTAAATTCTA AAAATTATCC ACAGCTTCCC ATAGATTTTC TATGTAGGTA 31080 

ATAACATAAT CCACAAAAAT GACACTTCAA TTTTTTCCTT TCTGTTTCTT ATGTCTTTAT 31140 

TTCTCTTTCT TGCATTTCCC ATGTGGGGTC CCTAGACACT GTTGAATAGA TGTCGTGATA 31200 

GTGAGCATCC CTGTTCTGTA CACAGCCTCG AAAGGAAAAT TTTCAGAGTT TTGTTTTAAA 31260 

CAATCTGGTT GTTATAGGTT TTATTGTAGC AGCTCTTCAC CAGATTACCT GCATGTTTTC 31320 

XTTTTTCTAG TTTCTAAGAC TTTTAATCCA TTAATGAGTG GATGTTGAAT TTTAACAAAT 31380 

GCTTGTCTCT GCATGTATTG AAATGACTAT ATGACTTTTC CCCAATTGAT CTGTTAAGTT 31440 

GGTAAATTAC ACTGATATTC CAAAGTTAAA GCAATTTTTA CACTGGCACC CTCAAGTAAG 31500 

CCAAATTTGG ACATGATGTA TTTTTAAATA TATATTGCTG GTGTTGGCCT GTTAATATTT 31560 

TATTTAGAAT TGTTGAGCCT ATGTTCAAGA ATAAAATTGG CTTGTGATTT TCCTTCACAT 31620 

ACTGTTCATA TTGGGTTTTG GTATCAAGAT TACTCAAGCC TCACAAAATA ACATAGGGAG 31680 

TCTCATTTTT TCTATTTTCT GGAAGAGTTT GCATAAGTGT GGCATTATAT CTTCTTTATC 3174 0 

TCATAAAATT TGCTTGAGCC ATCAAATCTT AACATTTTAT GACAGGTTGA TTTTTTATTA 31800 

AATCAATGAT TTTAATAGTT ATAGGATTAT TAGGATTTTT TATTTCTTCT TTTGTTAATT 31860 

TTAGTAAGTA GTGTTTTCCT AGGAATTTGT CTATTTTATC AAAATTTATA AATTAATTCA 31920 

CAGAGTTGTT TATAATATCT TCTAATTATC TTTCTAATGT CTGCAACACA TGTAATAATG 31980 

TTATTTTTGC TTATAAATTG ACAATTTATA ATTGCGTATA CTTATGGGGC ACAAAACAAT 32040 

GTTATGATTT ATGAAAGCAA TGTGGAATAA TTAAATCTAG CAAATTAATA TATCCATCAC 32100 

CTTAAATACT CATCATTTTT TGTGGTGAAA ACATTTGAAA TTCACTTTTT TTCACAATTT 32160 

AAAAATGCAC AGTACACTAT TATTATCTAC AGGTGGTTCC TGACTTCTTA TGATGATTTG 32220 

AATTATCACT TTTCAACTTT ACAATAATGT GAAAGGAATA TGCATTCAGT ATGCTCTATG 32280 

ACTTATGTTG GGATTATGTC TGGATAAACC CATAGTAAGT TGAAAATATC AATGGGCTCA 32340 

TCCAGATATA ACTCCATCAT AATTTGAGAA GCAGCTGTAT ATTTATCATG GTGTGCAATA 32400 

AATCTCAAAA AAAGACTTAT TCCTCCCGTC TGAGATTTTG TACCCTTTGG CCATCACTCC 32460 

TTCATTCCCC TCACCCACAG CCCCTGTAAC TACCATTCTA CTCTCTGCTT CTATGGATTT 32520 

GATTGCTTGA GATTCCACAT GTAAGTGAGA ACATGTGGTG TTTGTCTTTC TGTGTCTGGC 32580 

TTATTTTACT TAGCATGATG TTCTCCAGTT TCAGTGATGT TGTTGCAAAT GATAGAATTT 32640 

CCTTCTGTTT AAAGGCTGAA TTATCCCATT GCATGTATAT ACTACATTTT ATTTATCCAT 32700 

TCATCCATTG ATAGACACTT AGGTTGATTC CATAACTTGG CTAGTGTAAA TAGTGCTGCA 32760 

GTGAACATGG GAGTAAGGAC ATGTCTTAGA CAATCTGATT TCAATATTTG GATAAACACC 32820 

CAGAAGTGGA GTTACTTGGT CATATGATAA TCTAGTTTTA GTTTTTAAAG TAACTTTCAA 32880 

ATAGTTTTTC ATGATGGCAG TACTAACATA CACTCCCAAC AGTGTACAAG GGTTCTCCTT 32940 

TCTCCACAGA TGTTCTCTTT TTCATTACTG ACATGAGTTA TCTGTGCCTT TCCCATTTTT 33000 

TGTCTTCATC TGTCTCAGCA GAGGTTTATC AATTTTATCA TTTAAAAGGT AAAAATTGTT 33060 

ACCTTTTAAA TCTTGTCTAT TGTATTTTTT TGTTTCATTA ATTTTTGCTC TGATTTTTGT 33120 

ACTTCCTTTT TTCCATATTT TTAGGAGATG ACTTTGCTGT TCTTCTAACT TCTCTTTCTA 33180 

GGACTCCTAG AAATATGTTA AGTCTGCTCA TTGTATTTTT CTCACCTTTA TATTTTCCAT 33240 

TGTTTTATCT CTTTCTTATT CATTCTGGGT AGTTTCTTCT AATCTACCTT CCAGTTCATT 33300 

AATTATCTCT TTACCTGTGT TGAATTTGCT ATTAAACCTA TCTGAATGAC TTTTTCATTT 33360 

TTTATTGGGT TTTTAAATGT TAAAATTCTC ATTCCTATTT GGTTCTTCCT CAAATTTGCA 33420 

ATGATTTTGT TTCAGCTGAT TGCCAAAACG TTTTTAGTTC AAGTTCATCT CTTTGAGCAT 33480 

AGTGAGCACT GTTGTTTTAC AGTCTTTATG TAAATACCTT CTCTTTTATT AATCTTTCCA 33540 

CGTTTCTGGT GGAGGGACTG GCTATGAGAG ACAAAAACTT TCTTTCAGGT GCTTTTAGGA 33600 

CTTACCCATA TTTCTTTCAT GGTGTCTATT ATTTTATTAT CTCATTATTT AGATACTTTT 33660 
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CTCCTCTACT AAACTAATGG TTCAAGGCTT ATCAAAGATA AATCCTCTGT CTTGTTCATC 33720 

TCTGTGTCTC TCATGGTATC TAGCAGACTT CCACCCAAGA TATAAAGACA CTATGACTAA 33780 

GTGAATGATT TTAGTCTTAC CTACCTGCCT GTTAACTTAC CTACTTGCAT CTCACTTATA 33840 

CTTCAACTTT TGGCTTCTTC CTCAACCTCA ACTACCCCAT TCTTCCCATG GCTCACTGTG 33900 

CTCACTGGCC TCCATACTGT CCCTTAAATA AGGAAAGCTG CCCTAGCCTC AGGGCCTTTG 33960 

CACCTGCTCT GCCTGCTGTT TGGAATGCTC TTCTTCCCAT ATACCCATCT GTTTTAATCC 34020 

CTCATCTTTT ATTCCCTCAT CCCATCTCTT CAAATGTGAT TTCTACAGAG GGTTCTCTGA 34080 

CCACCTTATC CAATAACCAG CATTCCGTCT CCCCTCTGCC ATTCTCCATC ATCTCACCAT 34140 

GCTTTATATC ACATATCACT AAGTGACAGT ATACTATAAA CGTACCCATT TGTTTACTGT 34200 

CTGCCTCCCT AACTAATGTA TAAGCTCTCT GAGGGCAGGG ACTCTGTTTT ATTTGTACAC 34260 

CACAATTATC TCCAGTGCCT TGAATAGTGT CTGGCATGTA GAAGGAATTC AAGAAATACT 34320 

TGTCAAGCTA GGTGCTGTGA TAACTACTTT ATATGAAATT AAGTATTTCT CCTCCAGCAG 34380 

CTCTAAAAGT TTAGTATGTT ATTATTGTCT CTGTTTTACT GATGAGTGAA CTGAGGTTCA 34440 

GAGAGGTTAT TTAGCATACG TATGAAGACA GAATTAGTGA GTGATTGACC TGAGATTTGA 34500 

ACTCAACCTG TGCTGTCTAA AGCTAGCCAG GCAGCCTCAC ATACATGGCA AATGCCTACT 34560 

GAGACATGAA CATGCAGGTT GGGATCCCAA ACTGTTGGGA AGCATAAAAG AAAAACACTA 34620 

AAGATGTGGG GAGTGTAGGA CTTTTTTTTT TAATAGGCCA GTGGCCCTCT CTGCAACCCT 34680 

TTGAATGATC AGCTTGATCA GAGAATCCCC TACCCCTACC CCTGCCTCAG CCAGTTTCTA 34740 

TCTGGCTGTG TCATCAGCTG GCTGATCCAA ACAGCAATGT CAACAAAAGA ATGGTGATCA 34800 

GGCACGTAAA GCAATGTGTC AGAAAGAAAG AAAAGGCAGC TCAGATGATG CAAGATCATC 34860 

CAGATGTCAA GCACTGTGTG GTGGCACACT TGCCCGTTCA TGTTGTTGAT TTTTTAAACA 34920 

TTTGTGATAA GAACAAAAAC TTAGTTGCTT CCCTCAGGTC CTCCCTGTAT GGATTAGTGC 34980 

AGACATCTGC CGCTTCAGGC TTTCTGATTG GTTCCCACTG GTTTGGGGCA AAACCGGAAA 35040 

CTTCTGAGCC AAGTGCAGGG GCAGAAGAGC TCCCAAGAGC TCCTGGGAAA ACTAGGAAGG 35100 

ACAATCAAGA AACCACCGGC AGCTCCATTT GCAGGATCTC ATCCCATCAG GGGCTGTCTC 35160 

AGGAGGGGGA ATTGGAATAC CATTCACCTG TCCCCTTTGC AGATACACCA ATGTCTCGTT 35220 

CAAGAACAAG CAGAAAGGAA ACACCAGATT GCCCAGAGCA CAGGATTAGG ACACACCACA 35280 

CAGAGCCAAC TCAGCGTATC ATTGTTTGCA TTGATCATCT GGGGATGAAG CAGGCTCCGT 35340 

TCTGGAAGGG GCAACCTGAA TAGAGAAGAG TCTGACATTG GAGTCAAGCA GAACTTGGTT 35400 

GGAATTTGGC TCATTGCTGG GTGATCCAGA GACAGTTATT TAATCTGAGA ATCAGATATC 35460 

TTGTCTGTTA AATGGAAATT ATAGTAGCCA CTTCACAGGA TTGCTGTAAA GAGTACATAA 35520 

AACCAGGTAC CTGCAATGTA TAGTGCTAAG CCTGACACGT AGCAGGGTGT TAGTAAGTGG 35580 

TACCTCTGAC TGGGGATGGA AGCCAGAGGA GCTGGACCTT TATTTGACTG GCCAGAAGCC 35640 

AGCTCTCTAG TCACCTTCCT GATCCTTCCT TCTTCTGTGT GTACACGGAC AATGTTTTTC 35700 

TACATAATGG AACAGTGGCC CTCAAAACTT GTTTTCATAA GAATTATCCA GGTTGCTAGT 35760 

TATTAATACT AGTTATCCAG GTTGCTAGTT ATTAATACTA GTTATCTGTG TTGCTAGCTA 35820 

AAAATACACT CAGTTCCCAT CCCCAGATTT TTCTATTTCA GTAGGTGGTA GTGGGTTCAG 35880 

GAAATCTGTG TTTTTACCAA AGTATCCCCT ACTATAGAAT TAATTTTTGT GTTCCCCCCT 3 5940 

CATTCATATG TTGACATTTA AACCTCCACT GTGATGATAC CAGGTGGCTT TGGGAGGTGA 36000 

TTAGGTGATA ACGATGAAGC CCTCATAAAT GTGATTACTG ACCTAATAAA AGAGACCCCA 36060 

GAATGCCCCC TTGTCCCTTC TGCCATGTGA GGTCACGGTG AGAAGATGGC ATCTATGAAC 36120 

TAGGAAGTGG GCCCTCACCA GACGCTGAAT CTGCTGGTGC CTTGCTCTTG GACTTCCCAG 36180 

CCTCTAGAAT TCTGAATAAT AAATTTCCGT TGCTTGTAGC CTAGTCTATG ACATTCTTTT 36240 

GTGGCAGCGT AAATGGACTA AGATGTGCAC CCTCATGCCC TTTAGGGAAT TGTGACTTTG 36300 

AGAAATGCTG CCCTAGGATT TACAGAATGC TGACAAAGCT TTGTTGACTC AAATGCAAAA 36360 

TATTCTTATA AAGACCAAAA TAGAAATGAA TACTCCCTTG AACTCCTTTG GATGTGCACT 36420 

TTGCGTAGTT ATAGCACCTT TTCATCATGT GCAAATGAGA CGCAAATGAA TCCTTAGTTT 36480 

GACCCAGAAA GAATGTCTTT GCTGGTAGGG ACTACGGGAG AGAGAGAAGA GCCAGAATAC 36540 

TGTAGGAAAA TTAACACCGG CCACGAGACA ACTGGTTGCT AGCTCGGTAG CTGTGCAACA 36600 

TTGGCATGTT ACTTGAACTT CTAGAAATCT GTTCTTTCTT CTGTAAAATG AATATGGTCT 36660 

GGAAAGTAAA GACCAGTCAC CTCCTCTATC AGTTGGAGTC TAATCAGGAA GAAACCTAAG 36720 

TGTCTTCAAC AGAGGGAATT TAATGCAGGG AATGGGTCAC ACCAGTGTTA GAAAAGCTGC 36780 

AATGCCAAAG AGGGGATAAA GAGATAGCTC AAAGGTTAAT AAGAGCAGAA AGTCACTAGT 36840 

ATTCATAGGC TGAAAAGAGA AAGGGAGGAG ATAGTGTTCC CGGAATCCCT GATGGGCTTG 36900 

TCTGGAGGGC GCTGGGGCCA TGGAGGAAAT GTAGTAGCTG CTGGAGGCAT GCTCAGGGCA 36960 

GAGAGGGAGC AGAGAAATAC CCTGGCTTCT CATTTTCTTT CTCCAGTCCT TGCAGGCACC 37020 
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TCACTGGCTG AACTCAGGGG AGCATTTCTC CTCTACAGAA CAGAGTCTCC TTGCATACAA 37080 

CAAGAGGGTC AAACAGAGGA TGGCTTAATT TTTCCTTCCA TTTCTCACTT CTATGATTCT 37140 

CTCCCTTCAG GTTAAGTAAG TGAGGGTAAG TAAGCTGCCC AGTAAGTGAA CAGTTTTCCA 37200 

AACAAGCCCA CAGCACCACC TCTATATACA GCAACTCTCT GTTTATCAGC ACTGCATTAA 37260 

CCAGGACTCT CTATTAACTG GGACTTCCAG TTCCTTAAAT TTCTTCATGG TTCCTGTGTA 37320 

CTCCCAAAGC ATCTTCATCA AACAAACATT AAGTTACGCT TAGAGACCAT TTCTCAATTG 37380 

AATATAGATA AAAGATTCTA AGGCCTTGAA AAAAATTAAT ACATGCATAT TAGATATAGC 37440 

TATAAAAGCC AGACTATCTG ATTAATTATG TGACTGGTGT TAAACTGTTT GGACAAAGGT 37500 

TGGCTAAATT CCCTATGAAT ACTTACTTCC CTACTTCTGT GGACAAGGAA AAATAGACCA 37560 

AAGGTTCAGA TAAAAGCTTG ATTCAATGTC ATCTCTTTTC TCACGAATCT TGGTCATGTG 37620 

TGGGAAGTGA CCCAGATCTA GAACCTTAGC CTTTGGGACT TAAAAAAAAA ACAAAAAACT 37680 

GTTGAGTTGA ATCATTAAGT GTTACTGAGG GACAGGAGAG AGGAGGGTAG CTTTCTTAGT 37740 

TCCAAGACAA ATTTTGTTAA CAAAGATCTG TGGGTAGACT TGTGTCTGGG CAAAAGATCA 37800 

GAAGATGTGC TGTTCTAGGC CTCTTTGCCC TCAGACCCAT TCCCTATCCT TTCCCCTTCA 37860 

CTGTACCCCC TTATCTCCTC TTCTGCTGTC TTCCTCTGGG CCTGATGCTT GAGGATCCAG 37920 

AAGTTTCTCA GGCTCCCATG TTCCAGCAAT CCAGGCCTCC TTCCCAGTAA GGGATGAGTA 37980 

CAGGGGCCAC ACATAGCCCT GCAAGTTTTG TAATCCAACT TGAAATCCAA TGGCAGAATG 38040 

AATGGTTATA TATGGTGTGA CCCAGGACCA CATGCAGTTG TATCACATGC ACTTACAAAA 38100 

GAGCCCCATT TCTTGGACTC ATTCCCAGAC TCAATCTCTC TGAGGGTAGG ACCAGGAATT 38160 

CGGCCCTTTT CACAATCTTC CCAGGTGATT CTC TACATAG TATAATAACA CAAACTCATG 38220 

GAAATATATT TAATGAAAAA TGAATAAAAG AATAAATGAA ATAACAAATG GTGATGGCTG 38280 

GCACAATGTG TGTATCCATT CTCCTACTGA GGTGCACTTA CTTTGCTTCC AAATGTTCAT . 38340 

TTGACAAGTA GTGATGCATT GAATATCCTT GTACATGTGA GCATGCAGTA AAGTTTCCAT 38400 

GGGCTTATAT TTGCTGGATT ATGGGCACGT GCATCTTCCT CTTTTCTAGA TATTAACAAA 38460 

TCACTCTCCA AAGTATTTAT AACAATCAAC ACTCCTGAAC AAGCAGTGGG TTGGAATTCC 38520 

TTCCTCATCA CATCCTGGCC AACAATTATT ATCATCAGAT TTTTTAATTT TGCCAATTTG 38580 

AAGGAAATGC AGTGGCTTCT CATGTGTTAG TGTTTCTGAT GATCAGTGAG GTTGAGTGTC 38640 

ATTTTTTTTT TTTTTTTTTT TTTTTTTTGA GATGGAGTTT TGCTCTTGTT GCCCAGGCTG 38700 

GAGTGCAATG GTGCTATCTT GGCTCACTGC AACCTCCGCC TCCCGGGTTC AAGTGATTCT 38760 

CCTGCTTCAG CCTCCCAAGT AGATGGGATT ACAGGCATGC ACCACCATGC CTGGCTAAGT 38820 

TTTATATTTT TAGTAGAGAC AGGGTTTCAC CATGTTGGTC AGGCTGGTCT CAAACTCCTG 38880 

ACCTCAAGTG ATCTGCCTGC CTCGGCCTCC CAAAGTGCTG GGATTACAGG CACGAGCCAC 38940 

TGCACCTGGC CGATTGAGCA TCTTTTTATG TGTTTAATGA TGCTCATTTT TTATTGACTT 39000 

CCTTCTGTGC TTTCTTTTTT TTAGCAGTGA ATTTGAGTTG TAAGAATATG TATTTCTTTC 39060 

ACTCTGGGAT TCACCTACAT AAAGTAATTT TCACTTGAAT GAAAAAGAAA TCAGTTGTAT 39120. 

AAACATCTGT TTTTTCTGAA TTTTACTGGT GTAAAAATGG CCACTCAGCC CTGGAAGAAA 39180 

CAAAGGCACT TTGCCAACTG AAGTTGCAGA TGGGAAATTT TTAGAAAGGT CCTGTTCAAC 39240 

CTCTGGAAGG GGAAGATCAT ATCTGAAAGT CAGGGTAATC CACCCAACCC AAATGTTTCT 39300 

TCTACTATGG GTTCTGAGGA TTCGTCCATG TGCTTCTTCT GCATTGCTGC CATCTGATTT 39360 

CCTTTGCTAG GCTCCTCTTG CAACTTGGGC TACAAAGAGG TGCTTCATAG TCCACAGTCT 39420 

TTGCCTCACC TTCAGTCTTG AGGTGGTCCC CTAGGAGTTA TTGGTAGTTG CCGCTGGAAG 39480 

CCATTCTAAC AAACCTGGCG AAGGCACAAA AGGATAGAAA GCCTTTAGCC AATATGGTGC 39540 

CATCAAAAAC AAACAGAGCA CGCTGCCCAG TCCTCTTCTG GTTGCCTTTA CTAATGCATC 39600 

AGTCATACTT CTTCTGCACT CGATCTTAGC CAAGAGGTCG AGAAGCCATA GTCATAATTC 39660 

TTCTGAAATT AATCTCTTCC TGCCCCACCT CCCCATCATC TGTCTTTGAA TTCCCAGGGC 39720 

TAGTACTCAT AAGATTATCT CTTTCTTCTC CTTTATGAGG AGACCCATTC TTTTTCACAA 39780 

ACCAGCCACA AAAGCAAGTG TCATTACCCC CTACCGGAAA TACCAGACAG AGAGTTCATC 39840 

TGGGGTTAGT TTCTAATCAA GCCTCCTGCC CGGGTTTTTC CTGCTCCTGT CTTGAAGCGA 39900 

CCACAGGGGG AGAGCAGTTT CCAAATATGA TCCCTCCTTT CCACTGTCAC TTGTCCAACC 39960 

CCGACCACTA TCATTCTTTT ATTTGCTTCT CCCCTGAGCC AGCCAAGAGC CTAGGTCAGT 40020 

GACAGGGCAG GCAGAAGAGA GAGGGGCTTC CAGGAAGGAG AGGGAGCAAC CCACAGAAGA 40080 

GGCAGCAAGA CAGGAAGGCG GGCAGGGGCT GAAAATCCAA TACATATCTA AGTACATTTT 40140 

TCTAGGATGG GCTTCTACAC TCAGCCAAAA CATATATTGC ATATTGTTTG TATTTTTTAG 40200 

AGGTTTACAG GTCTCCCTGA AAGTCCCTCT GTGGAATTAT AAACCTCTAA TAAAAAATCC 40260 

CAGGGTTAAA GAAAGGAAAA GATGAAGGAG AGGCCCACAC TCTGAAAGGA AAGGGTTCAG 40320 

CGACTCCTGG AAGGTTCTGG ATGGTGCTTC CTTGACCAAG TCAGCTGCTT CTTCTACCTG 40380 
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GTCTCCTTTG TGGTTCAGCT GGGGTGGGGC TTACTAGAAA AAGCTGTGGG AGGTGGTTGC 40440 

TCCAACGTAT GGGGGCTGTC TGTAAGTGTA GGTGTTATCT GATGAAAGCT GCCCCGGGTG 40500 

AGGGTTTGTA CAGAAAGCTC CTGGTGGTGG GGAGATAATG TCAAGCTTCT CTCTCTCTCC 40560 

CAGATCCTGG TTGTATCCTC TGTCCCTCTC CACCCCCACC CACTCACCCA CAGACTTCCA 40620 

AGGAACCGGC GCCTGCAGAC ATGCCTCTCT GATGCCCTCC CAGTAACCCC TGGCAGGCAG 40680 

CACAGCGCCA AACCTCTTGG CCTTACCCCA CTGGGCCCAT GACCCAGTGG CTGTGCCTCT 40740 

GGGTCCTCCC TGTCCTGCAA AGAGAACTGG GCCCTCAGTC AGGTTCTTCT GCTCCAACCC 40800 

AGTGGCCACC TGTGCTCTTG GGGAGCTCGG GGGAGGCTGG GAAACTTTCA AAGAGCAGTT 40860 

AATCACTAAC TAGCTGGAGA TAAGAGAGAG AGAATGAAAC AATTGAGAAA ATGCCCAACC 40920 

CAGAGGTTAG TGCTTCCCTG CCTGCACACG CCAGAACCTG GCCCGCCCAG AGAAACTGGC 40980 

GATCAAACTG AGTTTGTTCA CTGGAGAGAG CTGACATACA GTCTCTAAGG GGCTGCAGTA 41040 

TCCCAGGCTG AGGTCCAGTG GCAGCCGCTG CCCCTTTCCT CCTAGGGCCC TTTCCTTCAG 41100 

CCATGCCTCA GCCCTGAAGA CAAACAGGAG CAGTTTTCAA GGAGCCCTTC CCTTATCTCT 41160 

AAGGTCTGGG CCTGGAATTC AGCTTGGCCC ATTTACTATG CCAGCTCTGT GCAGGGTGCA 41220 

GAGATCCAAG ATAAATCAGA CAGGGTCTCT GCTGTCAGTG TGCTCAAGGA AAGAGGCTTT 41280 

TAGGGGAAAC AAATCTAAAC GACTGCCAGC TGGAACTTCA ACTCTGTAAA GCAGCACCCT 41340 

GCCACATCTG CCTGCTGGAA CATTTTCATC TGCTGGGCTC ACGTAGCTGT GCAACAGCTG 41400 

GGGCTGGGGT CACATTCTGG GCTAATCTGA TGATTATTTT GGCTAGAGTG AGCTCATCCT 41460 

TTTTGTTTTC AGGAGCTGTT CAAGGGTGGT CTGATGGTTT GGATCAAGAC TAGCTGTATC 41520 

CCGGAGAAGA ATACGTTGAC TTTTCTGGGG TGGGGTCTGG GGCAGAAAGC AAGAAGGCTG 41580 

CCTTACTTCA AGGAAGGCTC TCCTTCCACC TTCTGCCCTC TGAGTGCCTT GTATGCGCAA 41640 

GTGACACTAG ACAAAGTGCT TAACACTTAT TACCTGACTT GAATCTCCCA ATGGCCCTGT 41700 

AAAGCAGGTA CTCCATTATC ATCACCACCC TTCTTTTTAC AGGCAAGAAA ACCAAGGCAC 41760 

AGTCAGTTTA AATAACTGGC TCAAGGCTGC ACGGCCGATA AGTAGCAAAT TTGGACTTCG 41820 

AATCTGGGCG CTCTGGCTTC AAAGTGTGCT GTCCATTGTT CAGGTTCTGG TCTGGTACTG 41880 

GCAATGTCAG CCACACCTGG AAGCTTGCTA GGACTATAGA ATCCCCAGCT GACCCCAAAC 41940 

TCCCCAAATT AGCACCATGA TTTTAACAAG ATCTCAGGTG ATTGGTGTAC ACATTACAGT 42000 

TAGAGAAACA CTGCCCTTTT CACATTATAT GGCTCTGTGC TCAGTACAGA TTTAATTTTC 42060 

TTTTTTTTTT TTTTATTATA CTTTGAGTTC TGGGGTACAT GCGCAGAACA TGCAGGTTTG 42120 

TTACATAGGT ATATATGTGC CATGGTGGCT TGCTGCACCC ATCAACAGGC CCCGGTGTGT 42180 

GATATTCCCC TCCCTGTGTC CATGTGTTCT CATTGTTCAA CTCCCACTTA TGAGTGAGAA 42240 

TGCGGTGTTT GGTTTTCTGT TCTTGTGTTA GTTTCACAAT CATTCTCAGA TTTAGCTTTC 42300 

AAACTATTCA TTCCACCTGC CAACAATTAG CGAGCTCCAG ACATTGTGCC AGGTGAATGA 42360 

TGGAGGTGAA GAGACAAATT TCCTTATAGA ACTTGGCCAT GCCCTTCATG CAGGCAGTGT 42420 

GTGGAGTGCA AGTCAGGACA CTTGGATCTA AATCCAGTGC TACCACCTGC CGGCTGCGAG 42480 

ACTGTGGCTG AGTCATTTCA CCTTCTTGGG TCCCAGGTTC CTAATCGGTA AAACCGGGAG 42540 

GCAAGCCAGA GATGTCCGGC CCCAGCAGCA TATTCTATGT GAACAGGATG AGGTGCCCAG 42600 

CAGGCAATCA GTGGGGATCT GCTGAATGAG GGAACCAGTA AATGAGTGAG TGAACCGATC 42660 

ATCCACCACA AGGAAAGAGC CCTCCATTTC CAAATGAAGA AAAGAAGTAT GCTAGTGGAG 42720 

GGGAGACGGG ATTATCTGCT GTGTGTCAGG GAAGAGTAGG GCCTTCCCAA GCTCCCTTAA 42780 

TACTAACATT ACACAGGGGT CCTCGCTTGC CCTTCTCAAT GGTCCACTCA GATGATTTCT 42840 

CTTGGCGAAT GTCTGCCCCA CATCTGTGTG TCACTCAGCA ACTTTGGCCA CCTATCCAGT 42900 

GTGAGATCTC TAGATCACAA GGTGGGGAAA GGGGTGAGGA ATGACCTAGA ATCCTGGCCT 42960 

CTGGCCTTAG AGCCTCACTT GTTAAAGGGA AAGGGGCAAA TAAGATCTGA ACATCAAAAA 43020 

TTATTTCAGC TTGCCTTCCC TCTCACTTTT CTCTGTCCCC TTCTCCTCTT GTCTTCCCTG 43080 

CAAACCACTT TGAGTCTCCT TTGGTTACCA AGATAAAACC AATCCACATT AACTATGGCT 43140 

GGTATTTTTT TCGCTTTTAC TCCAAGCCAG TGCATAGTGC ATTTTGCTCA CATTAGATTA 43200 

TGGAATCCTT CAAACAACCT GATGATGAGT GGGTGCCATT GATACCCCCA TTTTATAGCT 43260 

GGGACAACTG AGGCACAGGG TTGTTAAGCA GCTAACCTGA GGCCACTCGG TCACTTCCTT 43320 

GTGGTGGACC CAGGATTTGA ATCCAGGTTT GCTCAACTCC AAAGCCTGTG TACTAAACGA 43380 

CACTTCCTGC CTTGATAAGA TAATTGTGGT TGTTACTTGG CCAAATAAAA AGCCTATGGA 43440 

GAAGTTGTTT CCAATGAAGC ATATCAGCTT CTAAATCTGG CTGAACATTG GACTCTCCAA 43500 

AGGGGCACAA AATACAGCTT TCCGGGCACC ATCTTGAAAT GACTGATTCA GCAAATTGGT 43560 

CGTAGGCAGC GAGGCACCTG TAGTTTGGTA AAGCTCCCAG GTGATTCTGA TAATGAGCTT 43620 

GTGCAGAACC CATTTACCTA AGGAGAACGC GGGTTCAAAG GGACTGGACG GCTCTTCCTT 43680 

ATTTAGAGTA GGAGGCTGTT GGCTTCTGAG AATGAGGGCT AATTAACTTT GGGGAGCTTC 43740 
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CTGCAGTGAC 
GGCGGCTAGQ 
TCCCTCCTGG 
AAGGTGATAG 
CACCTGTAAT 
CGAGTTTAGC 
CTGGGTGTGG 
CAAACCCAGG 
ACAACAGAGA 
CTAAAAGTGA 
TTCTAGGCTG 
CATTTACCAC 
GAATGTGCAT 
TGGCTACACC 
TCTCATTGGA 
CCTCCTCTAA 
TGATGTGATT 
AAGTGGCTTA 
TTTGTTTTGT 
GGAATATAGT 
GGAATATAAA 
ACTCACCCAC 
CTTATTCCCA 
CAGGACTCTG 
AAGGGGATGC 
AGTGCCCAAA 
GCTTCAATTC 
TTGTTGAGAC 
ACCATGTTCC 
AATAATAATA 
TGACTGAGCC 
CTCAGCCCTT 
GGGAGTTGTG 
CTGCCCTTTT 
AAACTGGCCA 
TAAGTTCCAT 
TCATTCTCCA 
AGTCCAGCTC 
AGAGGTGAAG 
GTGAGCTCAT 
AGCAGCAGTT 
GCAATAATGT 
GAAATGAAAG 
AAATAGGTAG 
ACTCCAAAGG 
GGCCAGGAAG 
ATAGTGGTCT 
AAGATGGTGA 
TTAGGATCTT 
CAAGCCACTT 
CTTCTCCCTT 
GAGAAGATCT 
TGGAAAAGCT 
CCACCAGGCA 
TGTAAAATGG 
AATTGCTTTG 



CTTTGCCTTC 
AGGAAGGGTA 
AGATGGTGTG 
GGGTGGGATT 
CCCAGCGCTT 
GTGGCCAACA 
TGGTGCACTG 
AGGCAGAGGC 
GAGACTCTGT 
GGCAACCTTA 
AGGGCCCCAC 
TTGTCCTATT 
TTATTTACAA 
TAGAAGGTAA 
TCCTGCAGCC 
AGTCTGTGAA 
CAATATTGAT 
TAAATACCGC 
CTTTTAAAAA 
AGGTGTTCAC 
GTCAGAGGGT 
TTAGTTTTGA 
TTCCTTGGGT 
AACTGCGTGC 
CTGATGGTGG 
GCCTGATTCT 
TCCCACTGAG 
CAGATTCTGT 
CTGTGAGTGC 
ATAATAATAA 
AGCCTTGCCT 
TGGTTCTATT 
GTTAGAACCT 
CAACATGCCT 
TGTGGACCAA 
TGCTGAAGTC 
TTCGTTCAAT 
AAGAGAGAGG 
ACTCTGGAGC 
CAGGAAAGGC 
AGCCAGGGAA 
GGGGAGGAAG 
GAGCCAGTAT 
GTGCCAGGTC 
GCATGGGAGC 
GCCTCTGGGA 
TGGGCAGCAC 
CAAATGACTC 
GCTTACCTGG 
TTTAACCTTT 
CCACCCTCAA 
ATCTGACTTT 
CCCTGGTGCT 
TTTGACATGG 
GAATCATGGT 
TAAACTGCAA 



GGGGAAAGTG 
GGGTGTTTGC 
CACTGAGTGC 
AATTAAAATA 
TAGGAGGCCA 
TGACGAAACC 
CAATCCCAGC 
TGCAAGTGAG 
CTCAAAGAAA 
GTTTTTCTGG 
CTAGTTCAAG 
AGACTCTTAG 
GGCAATAATA 
CTGTTAATAA 
TATGCTCCTT 
GGATTGAATT 
TAATTCCAGG 
AGCACCAGAA 
TCTGTTCATT 
ATATTTATTG 
CCGACTGGTG 
GAACTCTGGT 
GCACGTGTTG 
CCTCTGCACA 
CCCCATGACT 
GGAACATTTT 
CAATCATGCT 
GTTCTACGAG 
TTATTTTTAA 
TTACTCCTGC 
GAAGGCAGGG 
ACCACCTTGT 
GCCAGAAATT 
GTCATTCAAG 
GGCAGTGGGA 
TGATACCTGT 
ACATATTTGT 
AAGTACATGA 
AGAGAGGCAA 
CCCCCCAGGG 
GACAAGGAGT 
CAAAGAGAAA 
GTAAAAATCA 
ACAGAGGGCC 
CATCAAAGGG 
TCTCCTCTTC 
CAAACTGGTG 
ATCCTCAAAC 
TTTGCTGGTG 
CTTCTACACC 
ATTTCTAAGC 
GGCCATGGAA 
AGATATGGAC 
GCAGAAGCAC 
GATGGTGTGA 
AGCTCTGAAT 



TGGGGATTGA 
TGTCAGGCTC 
AGTGGCTGCT 
TCAGGCAGTG 
AGGCAGGTGG 
CTGTCTCTAC 
TACTCGGGAG 
CGGAGATCAC 
AGAAGCAGTG 
GTCTTTAGAA 
CCTTCTAAAC 
GTCTTTTTTT 
TCACTACCTT 
ATAGGATGAA 
TCACTGAAGG 
AAGAGAATTG 
TTCACCTATT 
TGTAAACTCC 
GCTTTATTCC 
ACTACGTGGA 
ATCGAATGCC 
GACCCAACCT 
CTGTGAGGAT 
GGGAAACAGC 
TTTCATATGC 
CTTTGCTGTC 
GACATGAGGG 
TATTGGGAAG 
TAAAAACCTT 
TAATAATATA 
GAATGAATTC 
AAACCTGAGG 
TCTCACTATG 
ACTTACGATT 
TTTACGTGAC 
CATCTGCTGT 
GGATTCCTAA 
GATGTTACCA 
CAACTCAGGT 
AAGCTGTGTT 
AAATGTACCT 
GAGAATGGGA 
GGTGAGAGAC 
TTGTGAATAG 
TGTTGAACAA 
TCCAAACTGT 
TTTAGGCTCA 
AACAGAGCAG 
GCCTATGCAT 
ATGCCCTGCA 
CATGTCCAGG 
GAGGTATAGC 
GACCTGAGCT 
TTTTCCTCAC 
TATTTGAACA 
AAGTGTTTAT 



GATAAGAGAG 
CAGGCTTAGC 
GGAGAGTGGG 
TGGCTGGGCG 
ATCAC CTGAG 
TAAAAAAATG 
GGTGAGGTAT 
ACCACTGCAC 
AACCTTTAGA 
GCAGAAGTGC 
ATCCAGTGTT 
TTAATGACTC 
TAATGGAAAA 
ACCCAAGGCT 
GTGATATCAG 
GAAAGGGCAC 
ATCTAAAACC 
ACAAGGGCAG 
TAGACTCTGG 
CTCTTTTTAG 
TTCGTTCTGT 
ACAGCCTGTC 
CAGATGAGGT 

TGGGCCGATT 
TTTGGGCTGT 
TTCTAAATGA 
AGGCGGAGTC 
GGTGATGCAG 
GGTATACTGC 
AGGAAAGACC 
AATGACCTCT 
TTGTTCTGTT 
AATCAATCTT 
TCCTAGGCAT 
ACCCGCCAAG 
GGGGTGACAT 
AATGCCCCTG 
CACAGTGTGG 
GGGGACTGAA 
TGGGCTGGGG 
AGGCATGTGG 
GAATGGCCTG 
AGCTGGAGAT 
TATCATGGAC 
GGAGATGCAC 
GGCTCTGGGG 
GCTCACATGC 
GCATAGGAAG 
TTAATTGTAG 
CCTCCCCTTC 
TCTCGTTTTC 
AGGTATCAGT 
TCCAGTCCTG 
TGAGCCTCTG 
AGTTTTTTTT 
TTGGGATTAT 



AGAAATCCTT 
CCTCGTGGTG 
TGGAGAGATG 
CAGTGGTTCA 
ATTGGGAATT 
TAAAAATTAG 
GAGAATTTCT 
TCCGGCTGGG 
TTATCCCACT 
CCTTGGGTAT 
TTGCTATATT 
ACTTATTAAA 
TTAGCAACCC 
GGAATTAACT 
CCAACTGAGA 
ACATTTCTCA 
ATGTTACTGA 
AGTTTTTGGT 
AACAGTACTT 
ACTGAGAAGC 
ACTCAAGCCC 
CCACCTTCAA 
CATGGATGGG 
ATAAATTGCA 
TGTGAGAGAG 
GAACCTGCTT 
AGACCTTACA 
GCAGGCACCC 
TATTAATGAA 
CACTGGTCTG 
TGACACTGGT 
TTTATCCCTA 
CCATTGGTCA 
TGACAGAGAG 
CCGGTGGGGC 
CCACACCATG 
CTGCTGTGAT 
TATGTGCTGG 
TGGTGGCGGG 
TCTAAGGATG 
GGCAGTCTAT 
CCTGTTTGGG 
GAGGCTGCAG 
GCTGGACTTT 
ATTATAGAAA 
ACAGCTCCCT 
AGCTCACAGC 
GAGGCCCCAG 
AACAGAATGC 
TCTCTCTGCT 
ACCTGTGCCA 
TGGAGAGGGC 
GCTCTTGCAG 
TTTCCTCATC 
TTTTTTTCAA 
TAGGAACTGC 



43800 
43860 
43920 
43980 
44040 
44100 
44160 
44220 
44280 
44340 
44400 
44460 
44520 
44580 
44640 
44700 
44760 
44820 
44880 
44940 
45000 
45060 
45120 
45180 
45240 
45300 
45360 
45420 
45480 
45540 
45600 
45660 
45720 
45780 
45840 
45900 
45960 
46020 
46080 
46140 
46200 
46260 
46320 
46380 
46440 
46500 
46560 
46620 
46680 
46740 
46800 
46860 
46920 
46980 
47040 
47100 
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TTTGCTGGAA CAGTCTACCA GAGGGATGGA 
AATATTTTTA TCATATGAGA TACAAATATG 
CAAATATATC TGTCATAAAA TTTAAAAAAG 
CAGCAACTAT TAATTTTTTG TTGTATATCT 
CAAAAAATAT AATCATATTA TAAACTTTGT 
AATCTTTCAG CATCATGGCA TATAGATCTG 
AACCATAGTT TATTTAGCAT TTTCCTTTTG 
TCATATATTT TTTCTGGTCT TTTGTACATA 
TTGCTGACTC AAAGGATACC AGCATTTTAA 
ATAAAGAGTG TACAAATACA CCCTCCCACA 
AACCACAGGC ATTACCACCT CTGCTGAAGC 
CCCAGGCTGG AGTGCAGTGG CGTGATCTCT 
AGTGACTGTC ATGTCTCAGC CTCTGGAGTA 
TGGCTAATTT TTGTATTTTT GGTAGAGATG 
GAACTCCTGG CCTCAAGTGA TTTGACTGCC 
GTGAGCCACC ATGTCTGGAC TGCTGAGGTT 
CTTGTAGCCC AGGCTGCAGT GCAATGGCAT 
AGGTTCAAGG GATTCTCCTG CCTCAGCCTT 
CCATGCCCAG CTAATTTTGT ATTTTTAGTA 
GGTCTCGAAC TCCCAACCTC AGGTGATCTG 
ACAGGCATGA ACCACTGCGC CCAGCCTTGC 
CTCCCTCAAA TGGTCATGTG GCCACTGCCT 
GCCTGTTCTT ATATAACACC AGTAGGTAGG 
GGTAAGAACC AGCCCTAGGG TATTTGGGAA 
CATTTCCAAA CATAAAATCT AGCAGCAATG 
GCCTGACACA AGAAATTATT ATTATTGTTG 
TTGTGTTTGG AAGGAGGATA GTCAGAGAGA 
GTTTTAAAAA GTGTGTCTTT GTCATTGTCG 
TCAGGCCTTT GGTGTTACAA AATGCAAAAC 
GTATCCCATG ACCTCTGGTG CTGTGTACAG 
CCTGGGTGGG GAGGGCGAAA GGGGCCTCTC 
TGGAAAGTTC TGGAACTGGA TCACTAAGAA 
GACCAAAGAA TTAGGAAACT GAGATTGGAG 
AATTTTCATA AGCAGGGACA GAGGATCAAG 
GAAATTGTGT AAGGTAGATG GCTATGTGCC 
AGGAATAATA CATGAAATGA CTGACAAAAG 
CCTCTTAACC TTATAGAGTT AGCACTGTCA 
CAGAAGCTCG GAGAAGTCAA ATTACTTGGC 
AAATCTGGAA TACAAACTTA GGTCTATCTG 
GATTTCAGCA CAGCAGGGTT CAACTTGGAG 
CGATAGTGGT AGGTTTTCTT CTGTCTTGAT 
GACAACGCGA AGGATCATGA CTTCCATTTT 
GACAGCTCTA CCACATTCCC AACCCTATGC 
CTAGTTGGGG GCAGTGTTGG GGGACTTGGG 
AGATTATGCA GTGTGGTAGT AGGGACTGGG 
TAAAATCCTA GTTGGTCCCC AGTGGAGCCT 
ATTAATTACC ATATAAACTA GACAGTCCTT 
TATCTTCATC ACAAGTTGCT GTCTGGCTTT 
CCTGCATGTC CCTGAGTCAC TAGCAGGCTC 
AGCTGTCACC CTGGAGAGCA GTGCAGTTTG 
GATCCCCTCT CTCTTCTTTC CAGGCATGCG 
GAGGATGCTA TCATAGGTGA CCTATGAGCC 
AGCTTTACAG CACAGGTTAT ACAAGTAGTA 
GGTTTCAGAC CTGGCTCTGT CATTTATCTA 
TTGTCTATGC CTTGATTTCC TCAACTATAA 
GAGCTTTTGA GAGGAATTGA TGCAAAGATG 
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AGGAGAGGAA CTGAGAAATC GATTCTTTGA 47160 

TATCTATATA AATATAGATA TAAATATGAA 47220 

GATGAACCTT GCCCCCAATC TCACCCCTAG 47280 

GCCCAGACAC ATATAAAATA TATATTCAAA 47340 

TTTTTAGCTT GTTTATTCAC ATTACATGGA 47400 

TCTTTTTAAT ATTACTTCAT GGTCTAGGTG 47460 

GTAAACATTA AAGTTAGTTG CAATTTTTCA 47520 

TATCTATGAG AGAAATTCCT AGAAATAGGG 47580 

ATTTTGGTAG GTACTACCAA ATTGCTCTTC 47640 

AACAGAGTGC CTGCCTTCCA TGCCTGGACC 47700 

TTTTTCATGA GACAAGGTCT TGCTCTGTTG 47760 

GCTCACTTCA ACCTCTGCCT CCCAGGTTCA 47820 

GCTGGGACTA CAGGTGCGTG CCACCAAACC 47880 

GGGTTTTGCC ATGTTGGCCA GGCTGGTCTC 47940 

TTGGCCTCCC AAAGTGCTGG AATTACAGGC 48000 

TTTTTTTTTT TTTTGAGACC AAGTTTCACT 48060 

GATCTTGGCT CACTGCAACC TCCGCCTCCC 48120 

CCAAGTAGCT GGGATTATAG GCATGTGCCA 48180 

GAGATGGGGT TTCTTCATGT TGGTTAGGCT 48240 

CCCGCCTTGG CCTCTCAAAG TGCTGGGATT 48300 

TGAGGCTTTT AAAACCATGA AACGCTCCTC 48360 

GCTTCATCAC ACTGCTCCTC TGTCTGACAA 48420 

GCCATCCGAG ACATGGTTAT CCAATAAAAT 48480 

ACTGGCTGTG AGGGTTCAAT GGAATATTCA 48540 

GAGAAACGTA CTTTAAGCAG AGAGTTTTGC 48600 

TTATTGAAAG TTCTGACACA CAGATCTCGG 48660 

GGAGGAAGGT ATGAAGAGGT CGAGGTGTTA 48720 

AGCTGTGGCT GGTCCCACAA CCTGGTTCTA 48780 

ACCAGGCAAC CAAATAGCGT TTCCATGGAA 48840 

GTGAGACAGT GAGCACTCAG AAAGGGATGG 48900 

CAGCCTCTGC AACATAAAAC AAGGGGCCAA 48960 

GACAGGCCCC ACTGCTGGCA TGAGTGGGAT 49020 

TTGGTCACCA ATTCAACTGG CCCATTTAAA 49080 

CCAAGAGCAC TAGGGAGATG GTGATGAATG 49140 

GGGGAAGGAG GAGAGAGGAT TCAGAATTAT 49200 

TAGCCTTTTA TGTGTGTTAT GTAATTTAAT 49260 

GGATCCACAT TAAAAAAAAA AAAGACGAAG 49320 

CAAGGTCAAG GTCACACAGC CACATGTGGC 49380 

ACTTTAAACC AAAATGCTGC ATATAGCTTC 49440 

ATAGAGGGTG GTGTTATAGA TTACCAGATA 49500 

GAAAGATGAG CTATTTTTAT CCTGTTGCAG 49560 

TGAACTGACA TTGTAGATTT GTGTATATTT 49620 

CCTCCTATCA CTCTTTTTGA GAATACTGGG 49680 

CCTGGGCGTA TGCTGGGAGG AAAGGCAAGG 49740 

GGAAGTTTTT TTGTTTTTTG TTTTTGTTTT 49800 

CCAGACCTCC TCAAAGTCTT TGAGGTTGTG 49860 

GGCCTTGGTG TTGCCATTCC AGCCTGTAAT 49920 

GTTCTGTAGG TAGAGGCTCT TTCGTAGGTC 49980 

ACTTGTGCTT ATCCAAACTG GTGAATCATT 50040 

GGAAGGCGTG GGTGCGCCCA TGGAGAGGGT 50100 

TAAGGAGCAG TGGCAGAGAA TTACGGAACA 50160 

AGGCACGTAC ATACGTGTCA TCTCAATGAA 50220 

CACAGGGATA AACAGCAAGG TTCTTAGGTG 50280 

GAGGTATGAC CTTGGCCCAA CCTTCCTAAC 50340 

AATAGAGATA AAAATGGTAA CTGCATCCAA 50400 

CAAGTACAGT GCCTAGCAAA CTGAAGCACT 50460 



90- 



WO 99/37809 



PCT/US98/01260 



CCATGAGGAG TGGTGATGCG GATGCTAATG CTGATGCTGG GACAAACTTA CACCCACTTT 50520 

ACAGATGGGA GAACTGAGCC TCAAGTTGTT TAAAGTGGCA TAGCTAGTAA GTGGTAGACT 50580 

TGGGATGAAA ACCCCAGTCT GTTTCCAAGT CAGGAACCCT TTCCTCCATA ATGCCGTCTG 50640 

CATAAATTAG ACTGTTGGAC TGAAAAACAA TCCGTTCAAA CCACAAGGGT ACATTGGCCC 50700 

AGGTTGCTTC TATGTTTTAT CCTCAATCTG AAGCAATATA ATGAGCAATG TAATGAGATT 50760 

ATGTTAATAT TTACTCAGGG TTCTGGGAAA CCCAGAAGGG TTTCAGGGTA AACCATCTCC 50820 

CAGCAAGCAA GGGCTCGCCC GCTAATTCCC CTTTCTTCCA AGACTGATCA GATTGCCCAG 50880 

TGCCTAGTAA AATGCCAGTT TCCTTCTATG TGGAAGGGAG CAAAGCTGTC AGCTCCTGCT 50940 

GGGGCACAGG GAGAGGATGT TTCTTGTGGA TAGGTAGGTG GTGCTTAGGG GTAGAGGCTC 51000 

TGAGATCAGG CAGACATGGT TTCTATCTGT CCTCCCAGCA GTGTGTCCTT GGGTAAGTTA 51060 

CTTAATGTTT CTCAGCTTCA ATGTCCTCAT CTTAAGATGA GGGATTATCA TGCTACTTTG 51120 

TGGGGCCTTT GTGAGGATTA AATGAGATCT TAGTATCTGG CACATAGTAA GTGCTTAATA 51180 

AAAATAATAA GGCAGAGCTG GGTAGATTGA GGGTTTGGTT TACAGCACTT TGACAGCAAG 51240 

TTGCTTGTTT CCTGCCATTC AGAGACCCTG GCCAAACTAT GTCCATTGTG GCCACAAGAC 51300 

CATTGGCATG TCAGCCTCCA AAAGAGAGAT GACTGCTCAG CAGGCATTAA CCAGATCAGA 51360 

GGTTCTTTGA TTCAGCACAG TGCTCTCTTT TTGCACTGCT CTCAGTCTAC CAACAGTATC 51420 

AATCACAGCA ACCATTCATG GTGCAAGGTG ATCTCCCTAA ACTTACATTA TATCTTTAAT 51480 

CCTCACAGCA GCCTTGGGGG ATGGTATTAT TTCCATCTGT AGATGAGACA ATAGGGGCTC 51540 

AGAGATGGTA GGTAATTGCC CAAGGACACA TAGCTGTTGG AGAAAGTAGT ATTGGAGCAA 51600 

AATCTATGTG TGTGCATCTA GATTGACCAA CCTTCCTGGT TTGCCTGGGA ATATGGGGTT 51660 

TTCTAGGATG TGGGGCATTC AGTGCTAAAA TCAGGAAAGT CTAAGATGAG TTGGTTACTC 51720 

TATATGCGGC CTCTCCGTGG AGGGTTGGTT GGTGGGCCTG GAAAAGGGAT AGGGATAAGA 51780 

GAGAGAAGAG GAGGACGCAG AGAGAATGGC AGAAGCAACT CTGCACTGTT TCTTTCTGCA 51840 

AAGATGTCTT TTCAATTCAA CCTGCTTGTT CAGTTCAACA AGCAGGTTTG AATGCCCTCG 51900 

TCCTTGGAGG GAGTCACGTC AGGACTTTCC GGGTATTTGA CCGTGATGAA GAGCGCTGTC 51960 

TGCCAGGGTT CGCCAGGCTG GGTGTGGAAA AATGGTGCCC CAAACCAGCC CCACATGGCA 52020 

GAATAGGAAA CATGCTGTCA TCTTGCTTCA TCTGAATCTC CATTCCATGA GGGCAGGAAT 52080 

TGTTTTCTTT TTTACTTCTA TAGCTGAAGC CCCAGTGCCC AGAATATGGC AGAAACTCCA 52140 

GAAACATTGG TGGAATGTAG ACTATTGAAT AATTCCAAGT ACAAACCAAT GGTCCAGGGA 52200 

GATTTAGATT CTGATGAAGG CAATCTGGGG AAGACTGAAT GGAGAAATAG CATTGGAAAC 52260 

GGTTTGGATA CCACGTGTTG GGATCAGGAA GCAGAGGAGC ACAGAATGCT TGTGCAGAAG 52320 

TGACATGGGC CCACTGCACC TGGGGTGGAC CCTGTGAGGT AGAGTTGGAG ACCAAGGGCC 52380 

TGAGGACTGG ACATGTCGGT GGAGACCAGG TGGTGGAGGA TGGAGAATGC CATGCCCTCA 52440 

GGGAGTTTGG ACTGCCTGTC GTTAAGCCAT TTTTTTCTCC AAATTTCAAT CCCCCTCATT 52500 

CCATTGTCAC CATATTTGCC ATGTCTGTGT ACCTACCTAT ATTACTTATT TAACACTTTT 52560 

CCTTCAAGTG ACTTACTTTT TAACTTTACA TTTGTTTTCA TATCAAACAC ACATGGCTGT 52620 

TAAAATAAAA ATTACGATTT GAACTTAGAA TCATCTTGCC TACCACATGA GGTAGGTGTA 52680 

CTTCCCTCTG AGGACCACAG CTCCAGCAAC TGGGGAACCG ACAAAGATTT TTGAAAGAAG 52740 

AAATGATTCA GTTGCTTTTT GGGAAGACTA CACACGTGAG GAAGTACTGA GTGGAAGATA 52800 

TGTGCATAAA ACATTGGCGC AATTGTGACT AACATGGTAA GAAATATTAT CAACGCAAGT 52860 

TTGGGGGGCA TTTCAAAGTC TCTCAATGGT CATCCGGATG AAATATGCAA GAACTGCTCT 52920 

CTCTCTCTCT CTCTCTGTCT TTTCTCTTCT TGGTCTCACT TTGCCCTCTT TCCCAGCAGC 52980 

TCTGCCTTCT CCCCCATGCT TGCTGCCAAC AGCTCTGAGG AATGGGAGGG ATTGCAGTTC 53040 

AAAGAGTAAA CAGGTCTACT CTGAGTAAGG CTGTGGGCTG TGCAGTGACC CCCAGTGGGT 53100 

CTGGGTGCCT GGTAATGATG CCTGCACTGG CATGATGCTG TGGCTTTCCA GGCTTGTTTT 53160 

ACCTGGTTGT GCAAAGAATG TTACCCCCAG CCAAGGCTCA AGTTCACAGA CCATTGGCCC 53220 

ATCCCCTAAT AAGCATATTA TTCCCAGCTG GGCATTGAAC TTCCAAGTTA AGGTGACCTG 53280 

CCAAACTGGA AAGAAAATGG ATTTGCAAAA ATCAGATGTT TGCCAACAGC ACCATCCCCC 53340 

ACCACAACCA TAGACAATTG TGAGATCTAA AGTTGGACTC CCTGAGGTTT TCTGCCCTGG 53400 

TGGTTCTGGC AACTCCTGGA GAGCCACAGA CTGATGAATT TGAGGATCAT AAACCTTAAG 53460 

AAGACTTTAA AGTATTTTTG GCATTAATTG ACAAAGTCCA CAGCAAGCCA GGCATGCTCT 53520 

TCTCTCCCAC TCCCCTTGTC AGAGATGTCT CTTTCCCCTT GCTCTTCTTA CCCCATTCTT 53580 

TCCAGCATAA CCAAGCTTAA TAGCTTCCAT GTTTCCACTG TAAGGAAGTG AGCCGAGTGT 53640 

GGTTGGTCTG TTTCACAGCA GGGCTATCCT CACACGAAAA GTTTTCAGAT GCATTGACTA 53700 

TGCAGATTTT TGGCTCAGTT TGCAGAAGAC TTCCTTATTT CAGTTTTACT GTACACCCAC 53760 

CTACATAATA CTTTTTGGTT CTTAGAATTT CAGAGCTATT AACCTCTAAA CTTAAATCAA 53820 
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AATTCTCATC 
TTTTTGGCCT 
CATGTGATTG 
ATGACACATA 
ATTCCAAGGG 
CTGAATGGAT 
TCAACTCTCC 
AGGTCTGGAT 
TGAAAGGCAT 
GCATACTAAG 
ACTGCCCTTG 
TGGATCCAAA 
GGTGGGTGGC 
GTCACAACTT 
TATTTGGCTC 
ATTTTATACT 
CATTTCCAAT 
TTCAAATTTC 
TTGTTTTGCA 
CCTTGGCCAT 
AAATGAATAG 
AATTCTTAGT 
CTGTTTCTTT 
TTTTTCATTT 
CCCAGACTGG 
AGCGATTCTC 
TGGCTAATTT 
GAACTTCTGA 
TGCCGCACCA 
CTGGCCACTC 
ACACTTAGTG 
CCTAATGCTT 
TCCCCAATAA 
CCTAGGCTGT 
CAGAGCCACC 
CAAATAATTA 
GTCTCTCATG 
CTACCAAAAT 
CCTACTAAAT 
TCACCCTTGG 
TGTAAAGGTA 
GCAGAGAAGG 
CTGCCCTACC 
CCATGGTAGT 
GTGAATATAG 
TTTTCACCAC 
TTAGCCACTC 
ATGTATATAT 
ATATATACAA 
ATCCTAAGTG 
ACAGGAAGGT 
GAGAAATGAG 
ATTCAAGAAC 
AAAGCTATTT 
AAGGAACAGG 
AAATATGGAG 



AAACTTTCCT 
AATTTCCTTG 
CTAAGAATAG 
TTGGGAGATT 
CTTCTTTTCT 
AATTTATTTA 
AATATCACTT 
CTCCTGCTTT 
TATGGGCAGC 
AATGAGGATG 
ACCTTCATAT 
TGCCTAAGGA 
AGAGTGACCT 
TTCCAATGGA 
TGAAGGGCAG 
TTTGTTTTTT 
TGCATTCCAG 
AGCCAGCTGC 
ATGACGCTGT 
GTTAACTTAT 
TACCTATCTG 
CTAGTCACTG 
TTTTCTATGC 
GTTTTCTTTT 
AGTACAATGG 
CTGCCTCAGC 
TTGTATTTTT 
CCTCAGGTGA 
CTGTGCCTGG 
ATATGAAACT 
GAGGGAGTAA 
TGTACTATTT 
ACAGATAGGG 
CTGGCTTTAA 
TTGCTTTTGT 
GGCATCTGGA 
GTTCCTGGGT 
ATCTTCTTGA 
ATGAATGGAA 
AGGTGTTTGC 
ATTTGGATGC 
CTGGATGCCT 
TCCGGTCCAT 
AAACAAAGAG 
TCAATAACAA 
ACACAAAAAT 
CACAATGTGT 
ACACACACAC 
ATTTTATTTG 
CTATACTTAT 
TGCCTGAAAA 
CCTTCTTGAA 
CAAAATGCTT 
GTGGGGAAAG 
GCAGGGGCAA 
GAGGAGTCCA 



AGGGCCTTGT 
TCCAGGAAGA 
GGGTGGGGGA 
TCATCTGAAT 
GACATTCCAC 
TACAAATGAG 
CCAGGGGGTT 
CTATCTGAAA 
AGTCTGGTTG 
GAGAAACTGA 
TTAATGCCTT 
TCTCCCTGGG 
TTGTATCACA 
TTTCCCCTCG 
GGGCTAAACA 
CATTTCAACT 
AACTGAGCTT 
TAAGCTTCTC 
CTGGTCTGAG 
CAGGCTCATG 
ATGGAGTTTT 
GGAAAAGATG 
CATTCCGGCT 

CATGATCTTG 
CTCTCGAGTA 
AGTAGAGATG 
TCCACCCACC 
CCTCTTCTAC 
TCATTCCCTG 
ATAAGCATTT 
CTCATTTAAC 
AAACTGAGAC 
AAACAATGTC 
AAGTCTGTTG 
TGGAGATTTT 
GAAAGAGGCC 
GTGAGTTCTC 
AATGAGGAAT 
TGATTTGGTA 
TTCATTAGCT 
ATGAGGGTAG 
ATATGGCTGC 
TTCTTATGTT 
TGTATCATGT 
GGCAAGTATG 
GTGTGTGTGT 
ATATATATGA 
TCAATATAAA 
AAAGAAATCT 
TGGCCACCTT 
GACCCTGATG 
ATCGTAGCAG 
ATAAGGCCAG 
ATTAGTTTAG 
GAAAATTCAT 



CATAAAAGAA 
CAGATTGACT 
GAGAGAGGGG 
TTCCCCTGAG 
CAGTGTGCAG 
TCTTTTTGAA 
TGTAGATGCA 
GGCACTGGTC 
ATTTTATACT 
TAGTGACCCT 
TGTTATCACA 
GTGTTAAGCT 
AGAGCCTCAT 
ATTCTTATTC 
TTGTTCTGTA 
TTCCGATCTC 
GACTTTCCAT 
TTTTCTGGAG 
CTCCAGCTAC 
AGGCTCGGTT 
TCTAAGGCTT 
AAAACTTAAC 
TCACCTCCTT 
CTTTTTTGAG 
GCTCGCTGCA 
CCTGGGATTA 
GGGTTTCACC 
TCAGCCTCCC 
TTTTTCTTAG 
CTAAGGTGGA 
CCAGAGAGCC 
CCCCCAAACA 
CTAAAGTTTG 
CTTTCACCGC 
GAATAGGCTC 
ATACATTTTC 
AGGCCCTGAG 
TGGTTGATCA 
GCAAAATGGA 
GATGTGTGGA 
TAGAAAGGAC 
GGAAGGGAAA 
ATTTCTTTAA 
AAAACAATTG 
ATTTGCAAAT 
TGAGGTAATG 
GTGTGTGTGT 
CATGTCAGAA 
AAGAATAATA 
TCCTCATACA 
TTTCATGATT 
GAAATACTGT 
TGAGGTTGGC 
AAAGAGATTG 
GCAAGAATAG 
CCTTGGTGCT 
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ACTAAGTCTC 
AATTCCAAAC 
ACAAAAGTCA 
TATGAAATTA 
GTGCATATGT 
TAGTTGCAAT 
TTCTTTCCAT 
TGGATCTCCT 
ACTTTGATAC 
CACCCCAATT 
GCAACTCTTT 
TGCTCAGTGC 
GACTTCCCAG 
TGAGCATTTA 
AGATCCAAAC 
GCTCTTCTGA 
GTCCATGTAA 
GGGATTGTGG 
TTGTCTTATT 
TTCTCATCTA 
AAATGAAGTA 
GAATATGAAT 
CTCTTACTTT 
ATGGAGTTTC 
ACCTCCACCT 
CAGGTGCCCA 
ATGTTGGCCA 
AAAGTGCTGG 
AAACATGGAG 
AGTATTGGAG 
CACCAAGTGC 
GCTCACTGAG 
AGCAAATATG 
ATCAGGCTGC 
TGAGATGCCA 
TACTTGGACC 
ACCTTTACCC 
TCTGTGGAAC 
TGGTTTTCTC 
GGAACTCAGG 
ACAGCAGGGA 
ACAAGGGGGT 
TCTCTTTTAC 
CTATCTAATT 
TGCTAAGAGA 
CCCATGTTAA 
GTATATATAT 
TGTCATGTTT 
CCTGGAAAAA 
AAAAAGAAGA 
TTCCCTCCCT 
GAAGAAACTA 
TTGAAGTCAG 
ATAAAATACA 
AGGCGTCTTG 
TGGGTAAGTT 
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AAAATAGGAC 


53880 


CCTGGACTCA 


53940 


TAGACATGCC 


54000 


TTCAGAAATA 


54060 


TTTGAATGAA 


54X20 


GGATGTGCTG 


54180 


GGGCATCAGC 


54240 


GCTTTCTATC 


54300 


ACTCTCAATT 


54360 


AGGTTTCACT 


54420 


GCTCTATTTC 


54480 


TATTAAACCT 


54540 


GCAAGACCAA 


54600 


GCTTTTTAAA 


54660 


CTGCTTGTAT 


54720 


GAAACATTCA 


54780 


GATCTTGTAA 


54840 


TTAAGAGATC 


54900 


TACTGTGCAA 


54960 


TAAAGTGAGA 


55020 


ATGCAAATTA 


55080 


AGTCACTATT 


55140 


TTCCCTTTCT 


55200 


GCTCTTGTTG 


55260 


CCCAGGTTCA 


55320 


CCACCATGCC 


55380 


GGCTGGTCTT 


55440 


GATAACAGGA 


55500 


GGTTAGTTCT 


55560 


TTCAAGCTCT 


55620 


CATGCAATCT 


55680 


TATGTTAATA 


55740 


GCAAAGTTTT 


55800 


TTCTGAGGAG 


55860 


CACGTTATCC 


55920 


TGAGTTTGCT 


55980 


AAGGTTGGCT 


56040 


AATGTGGGAG 


56100 


CACTATCACC 


56160 


AGTCTGAATT 


56220 


GAACTATATA 


56280 


GGGGCTGTAG 


56340 


TTTTGGGATT 


56400 


GTACAGCATG 


56460 


GTAGATTGTG 


56520 


TTAGCTCAAT 


56580 


ATATATGTTT 


56640 


TATTCCATAA 


56700 


CAAAAAAAAA 


56760 


AATTCTGGCC 


56820 


TTCTGAGACT 


56880 


AGACAGTTGG 


56940 


GGAACAGTGT 


57000 


GGCGAGACCA 


57060 


ATATTAATTA 


57120 


TAGCAACATG 


57180 
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TTCAGATGCC TGAGTTTTGT GTGTGTATGT GTGTGGGCAT GCACGTGTGT GTGTACACAG 57240 

TGGGTCATTC TTCTCAGGAA GAGTGAGCCA CTCTCCCCTC CTCCAGCACC AAAGTGGCCC 57300 

CCACCTTGGC ACGCCAGTGG CACATGCCAT TGGGCCAGGA TTTGCTCAGA ATGCAGGCAC 57360 

ACAGACATAA TGTCAGGAGG CATTGCTGGT GTGTGTCACA TCAACCTGTT AGAACAACTG 57420 

TCAACGTGTG ACCTCCCAAA CAGAACTCAG GTGCCCCCTT CAGAGACCGT AAAGCTTGTC 57480 

CTTAGAGGAT AATGAAGATC CCCAGGAACC TCATCTAATC CAAAACCAAA AGATTTGGGA 57540 

AATGTGACCT TTAGAGGGGA GTAGCATTAA GAAGCAAAAT GATACTTATT AATTCTGTTG 57600 

CTTATTTGAC TGTAACCAGT ATAATAAATG ATCATATTCT GCTCGATTTA ATTCCCCCTC 57660 

CCCATAAGTT TCACAAGACC AGAAGGAGTT TCTTCTTCCC ATTGGTCTTA CATTAATATT 57720 

CTTGTACGGC TTTCACTAAA TAGATGCCGT GTTCTGCCCT GGAGGTAACA CCACGTCATT 57780 

AGGAGGAGAT GATAGACAGA AATATATACA AACACACACT TGCTTTCAAA AATAAATATA 57840 

GGCCCTCTAG TTAAAAGGTA TTGTGTAAAG TGTGTGAGCA TCCTCTTTCT TGCAAAGCAA 57900 

GCACACAGCT TCCATTAATC TTGTAGCCAC AGCCTGTGTT GGTGTTAAGA CTCAGATTCC 57960 

TTAACGCTTG ATACTTGGCT TAAAGAGATT CTTTGTCCTG GCCTTGATTT GGGAATTAAG 58020 

ATCCCTAGGG TTTTTGGTTT TACAGTATGG ATCTTCTAGG AGACAACCCG ACTGACCTCC 58080 

GGGTCTCCAG GCCACCACAC ACAACCTGGT TTGCTTTGCT CTGTTCCCCT TTTCCTCTGT 58140 

GGGGACCAGC ACAGGACTCA ACTCAAGGGC TCTGTGTCTG TGCACAGGTT GGAGAGGGTG 58200 

ATAGGGCCTT GACCTGTAGG GACAACCAGG AAGATTTCTA TGCAGAGTAA TTGGGTTTCT 58260 

AGAGTTTGTT TCAGTTGATT TGAGGGCAAG CTGCTTGGCC TCTCTCTCTT GATTCTTCCC 58320 

ATCCACAGAA TAAAGACAAT CAGCTTTGTT TATCACTCTG TTCATTTTGC TATGTCTTTA 58380 

TCAGCCCCCC AGAGAATTCA GGAGCACAGA ACAAGTGCTG GAGGTCTCTC TTGCCAGAGT 58440 

CCTCCTTGAG AACTTACAAT GTGTCCATAT TAAGGATCTG CTGTGTTTGA TGATTTTGTG 58500 

ATTACACTTT AAACTTCTTA TCCATAAAGG ACATACTTGA TATATCTGAG ACTTGTAGTA 58560 

GAAGGCCTTG AGACATCCAT CTCATCCCAT CATTATCTAT CTATCATCTA TCTATCTATC 58620 

TATCTATCTA TCTATCTATC TATCTATCTA TCTATCATCT ATCTATCTAT CGCCAGTACT 58680 

GTCTTGTTGA AGTTGGCAGT AGGGTGAAAG ACCTCAAACT CCAAAGGACT TTCCGTATGG 58740 

ATGCAATATA CCTGCAATTC TAGCTTTTTT GTGTTTTTTT TTTTAGGTTG GGGGTGAGGG 58800 

GTATTGTTTT CATTTTTGTT TTTCTTCTGG AAGGTTCAAC TAAGACCCAA GTAAAAAGAA 58860 

GAATCAATAC TTAATAAGTA CCCAGCAAGT AGCAGGCACA CTTTTAGGTA CTTTATTTAC 58920 

AAAAAAACCT CCACAAATAA AGTGGCTTGT GAGTATGAGG TGACATCTTT CCCTCCCCTC 58980 

CCACCATCAC TACCCCAATA TGACTCGTCT CAATAGCCCT CCAATCTAAA ATGGACTAAA 59040 

TACAAGTGGA TAAAGAAATG GAGATTTAAC CAGAATTCTT CAGCTATAAA TTACAGGGCC 59100 

TATAATTAAA GGTGATTGGG ACTGGGTCAG AGAGCCACAT CACTTTTGTG GTTGCATTTG 59160 

AAGTTCACTA TCTCTTGACC ACACAACCCT AGCCCTTCTA CTCCCACCCT GCTGTCTCAG 59220 

GTTAATCTCA GGCAATGGTG TAAAGAAGGC CAAGTTTGTT TCCCTGGAGT CCCACGGGCT 59280 

CTAGCAATAA TGCTTCCCTT TTCTCATGAG TGCCCCGCCA CCCACCCCCC TTCACCATCA 59340 

CTACACACAA ATGCCCTGCA GTGGGTGGAA TGTAGTTACT TCAGGTTGTG CCTGATTTGT 59400 

CTCTCAAGCA AAACTCCAGC AGGCCATTCC CTCAGGGCCC TGCTCTCAGA TCTGGAACTG 59460 

ATAGACTAAT TGGGGCTAAT GTGATAATGG GAAATAATGA AATTTGTTGT TTTTATCAGT 59520 

GTGTATATGG GGCGGGGTTT ACATTTGCAT TTTCACAGGG CCCTTGGCAA GTTCACAGGG 59580 

TTGAACAGTT GGGAAGGGTG GGAATGTCTG GGGCAGGTTA GGGAGGCAGA GGGATTTATT 59640 

AGAACTCCCC TAAACTGCAC TGACCAAAGC CTCAAGCCCT TCTTCAAGAC CTGCCCAGCT 59700 

TCCAAGACCT TCCCAAGTCC ACCCTTGTTT TCCCACTGAG TCTTTTACAC TTTCAGAAAC 59760 

CTCTGAATTT GTGTAGAAAC TAGAAAAAAT AAGTAAGAAA AGACTAATAC TACTGCACAC 59820 

TCACTGTTCC CCCTTAATAT AATAACCAGT TTTTATTCTA TTCAGTCAGC CTTTGACCAT 59880 

AAGCAGACCT TTTTTTTTTC TTTTTAACAC AAGTAACTTC TTGGTTTTGA TCACAAAATC 59940 

TTTATCTCTG CCAAATCTCA ACTTCCCTTC CCTCTCCCAC AAAAGGGAGG CCCGTTGAGT 60000 

CAAAGAAATC TGCTTAGACA CTTTGCTCAT GCCAGGCCAG TGTCCTGGAA GGTTCAACAG 60060 

AGAGAGTTAA TGGTTGGGGG ATGGTATTTT TCTTTGCTAG GAGCAGTCAT TCACCCGTAT 60120 

GGGAGAAGGT ACATTTGTGA CCCAGTGAAG CAGGTACAGG TAACTCCCCA TATGTCCCTT 60180 

GGCCCAAGGG AATAGAGGTT GCCTGGGTAT TTGAATCCGT AGATCCTCCC TAATATTCCA 60240 

CCTTCTTCTT GTCCAAACTG TGCTTTTTTA TTTCCAGTTT CAGCATTTTG GTCTTCTCAT 60300 

CTCTAACTCT TATAGGGAGT GTCAATAAAC CTTTTAAAAA AGATCATGTA AGTGTCAAGA 60360 

GGAAGTGAAG AACCTAGATA ATCCACCAAC CGGATAATCA GCTCTTGCAT ATTTGAGAGT 60420 

TGACTGCTTG ACCTAAGCAT CTCCTCATAA GGTACCCTCC CTCCCAGGAC CTTCCCTTTC 60480 

AAACCTCTCA AGGCTCTTAC CTGGGGCCAG GGGAGATAGG CTTTTCAAAG TCCATTGAAT 60540 
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TGCCAAGAGT CTCTGTCAAG AAGGCAGTCA TGGTGCCTGG AGAGGGAACT TGCTGGGAGC 60600 

CCCTTCAGAG CCTGGTACTT ATAGAGCTAG GGAAAAGATC TTGATGCCAA AGCAGGGTGG 60660 

ACTAAATACA GACTAATAAA TGAGACAGGT GCTCAAGAGG GCCCCTCCAT ACCATCATCT 60720 

CCTCCAGATT TGGACTTCTA CTCACTTTGC TTTTACATTC CCTCTTCCCG ATGGTGTCTT 60780 

TGGTGAGCAG GGTGCTTTTC ACCTGAAACA GCCTCTGAGC TGAAAAGAAC AGTCACCACC 60840 

AAATCAATTC CTCATCCATT AACAGGTTGT CTCTCTGTTC TTGAGACACA GGCATTACCT 60900 

GGTTAGACCT GTTTTGTTTG AACACTAACG TGTGAGTTGG CCAAATGCAA ATGAGCCAAT 60960 

GTTTGTAATC CTTTATTTTA TTTTTTTAAA GGGCTGGGTA GCCAATCAGA AGAGGGGGAA 61020 

GTGACTTAGG GAATTCCCGG TTGGTGGCTT ATTGCTTAAC ATCCTACAAA ATGATTTAAA 61080 

ATTATTGTTA TATGCATTTA TCTTCACTCT GATGAGGGCT CAGACTTGAT AACGCCCGTG 61140 

GTGCCCCATC CCTATAGGAG CTGGTGAGAT TGCAGCCTGC TGCCTCCCCT CCATCAGCCA 61200 

CAGCTATTGG ATTTCCCACC CAGAATCTTT AGGTAAATGA GGTAAGTCCT GATTTTTAAA 61260 

ACTTCTTTTG AATCTGGAAT CCAAACACTT GAGTGGAAAG AGAAGCCTGC TTTAAACTGG 61320 

ACAGATGAAA CTAGAACAGA CTCTTGGAGA CGGCTGGCAG GAAGTGAAGC TCACCTTACC 61380 

TGGGCTTACC TCACTGGGTC AAATCAGAAT TTTATTTTGG AGGGCAGGTT GGCTACTTTG 61440 

GATATTATCT GTGAATTTCC TGCATTGTCT GGACTTCTAA TCTCTGTGAA TTTAAAAGCC 61500 

CCCTCGTTTC CCTATGCCTG GGTGGCAAAA CCATTCCCCT GGGTTGAATT CTTCTGGAAC 61560 

AAATAGGCAG CTAGAGATAG GTGGCTCTGA TATAGCTCAG AGAAGAAGTG GTTGGCTAAG 61620 

TAGCTGTTAG GGCTCAGAGT ACACGGTCTC GCTTTCTAGA GATGTCTTCT GCTGGTAATT 61680 

TTTCTGACTT ATGAGCTACA TGGAAAGGCC AATTTGTTTT TAATATGTTC CAGGACTGGA 61740 

AAATGGCTAG AAATAGGCAA GAACATACAC AATCACACTG GAAAAAGTGG CCAGGCAGCC 61800 

AAGGCAGGCA GAGGTATTGG GGAGAGCTGA ATATCTACAA AAACAAAAAT TCAGAAAAAA 61860 

CAAAAATCAA TTTTGGCAAA GGGCTTCACT GTATAACAAG GGGACAAACT AACCCTTTGT 61920 

TTACAAACTA ACCCTTTGTT TACTCCATTT TGTCCAGAAA ATACAACAAT CAGTTTTGGC 61980 

AAAGGGCTTC ACTGTGTAAC AAGGGGACAA ACTAACCCTT TGTTTACTCC ATTTTGGGAG 62040 

ACTATGATCA GACAGGCAGT TGTGACTCAG CAGCAACAAA TGCCTTCTGA GACAGGGATT 62100 

CTTTTGATTT TGCTTGGACA TTGTGGAGAA GTGTTAGCCC CAATGTGGAC TGATCTGGGA 62160 

ACAGTGGGAA ATTAACTTCT TGTTGGCAAA TATCAGGCTG AGGTGAGAAA GCGACATTTT 62220 

CACCGTCCAT CTTTGCTGAT TTACCGTGCT CCCAGGATGG TGGGAGTGTG TGTTTTTAAG 62280 

ATGGAGAGTG TATGCTTCTG GGTTCAAGTT CACAGGTGTC TCTGCTGGTT ATCTGCACTC 62340 

ACCTTGGTAA CAGGGAGAAA GTGAGTGAAT GGATTCCAAG AACTTACTGA TGGAAGTCTA 62400 

ATTCAGGAGT TGGTTCTGCA GCCATGGAGG TAAAGATGTG TTGATAGTCT TTCAATGTGT 62460 

AAAAGGGCAA TTAGAGATTC TGTGTGACTG TGTGTTAATT CCACTGGGGT CAGGGGAAAA 62520 

ATTTATTTCT AACAGAAAAG AAGAAGATAC GTTATTAGGA AGAATTTCAT GGCTAGGAGA 62580 

TACTATCAGA AAAGGCTCTT AAGAGATTTT AAGGATGACT TTAATAGCCG CATTTGAAGT 62640 

TTGCAGAGGA TCCACTTTTC CTCTTTTTGT GACCTAAAAT TCTGGGATGA TGAAATAACT 62700 

CACCAATTCC ATCTTCTTAT AATATGGAGT CATGTAGACA ACACCATTTT CACACAAATG 62760 

GCTAATGGTA TTTAAAAACC ATGATGGAAT GTGAATTGGG AGTCATTTGG AGGTCTGTAG 62820 

TTGAACTTGA AAAAATAATA AATGTAATGG AGACAATACT TCACCGTGTT TCCAAAATAT 62880 

TTTACAGAGG CATTTTAAAT GAAAGTCACT TTGAGGGAAC AGCTGTGCTG TAAGTTCTCT 62940 

TACATGACTG CGCAAGATGG TAGCCTTCAT CAAGACCTCT CAAGGTAGTG TGGGTAGGGT 63000 

GACGTGTTTG ATTCAGGCCT CGTTTGTTAT GAAAAGGCTC AAATTCAATT GTATTTGTTA 63060 

TTTTTTTGGT TAAAAAGCAC CTATTTGTTC AATTCAAACA ATCCTTTTTG GTTTTTTTTT 63120 

GAGATGAAGT CTCCGTCGCC CAGCCTGGAG TGCAGTGGCA TGATCTTGGC TGACTGCAAC 63180 

CTCCGCCTCC CAGGTTCAAG TGATTCTCCC AACTCAGCCC CCCGAGTAGC TGGGATTACA 63240 

TGTGCTCGCC ACTATGCCCA GTTAAGTTTT GTATTTTTAG TAGAGACGGG GTTTTGCCAT 63300 

GTCAGCCAGG CTGGTTTTGA ACTCCTGACC TCAGGTGATC CACCTGCCTC AGCCTCCCAA 63360 

AGTGCTGGGA TTATAGGCTT CAGCCACCGT GCCCAGCCAT ATTGTTTTCA TTTTTAATCT 63420 

ATTAGTCTAT CGTGATCTCC CAGTGGAAGT ATCTTTGGCC TTTGTGGACG TCAGGAAAGC 63480 

CCTACATTCC CACTCGCGAT TCCATGTTTA TGGGTACCCT AAATGCTCCC ATTAATTGAC 63540 

CAACTTTACC CTGATCTTCT TTCAATATCT TTCTGACTCC TTGAAGGTAT GAGACAAAAT 63600 

GGAAACTGAG AGGTTAAAAG GTTTACTAGG TTGCATTCAA TTAGCGAATT GGAAACTGGA 63660 

AGGAGCTCCT ATCGGGTCTC AGGTCAGAAC GTGAGTGCTT TTGGCCAAAG TTCACTTCTG 63720 

AGGAAGTAGA ATTTCGCTTT CTGGAATCTT GCGATATTTT ATTTCCTCTA TATCTTTCCC 63780 

ATGCCCCCGA CCCACCCAAT CTCCACAAAT TTGGGGATTT GAGCACTGGG TTGTGATCGT 63840 

TAGACCATCT TGCTTTTCTG AAAGCCCAGG GCAAGACCCC TGCTTCATGT CACAGTATCA 63900 
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AACACAGACA TAGAAGCTTG TACAAATTAT 
CATATGGAGT CATCTCTATG CCCTTTCATA 
GGGGTGGGGT GGTGAGCAAG AATCCCGTGG 
GTTGTGGTTT TAAGGAACTC AATCTAAAGC 
CTGCCGTTTC AGCCCATTTC TCTTGTTAAG 
CATTTCTGTT GTGGGTGGGG CCCTGGTTGG 
CTAGAGACCA CATTCTCACC CTGCCTTTGT 
TTCAGCTTGG GTAAGCCGGG TCTGCGGCGG 
GCAGCCACAC CTTGCCCAGG TTCTCTTAAA 
AGATAATTAA CTTTATTTGA CATTTTTCAT 
AATAGATTTG TTCTAAAGAG CAAACAATCT 
TGAACGGGGT TTCAAAGGGC CAAGCTACTA 
GAGAATGGCT CCTCACCACA GCTATTCTTA 
TTGATGTCCG GAAACCAGGA TGTGCTGAAG 
TTGCTAAAAT GTTTATTATT CTTCAGAGGG 
TGAAGAATTT TGAACTCCAG CTTTGGAGTG 
TGAGGTGGTG TGCTGGACGG CAGTCTAGGG 
CCCAGAGCAC ACATTCCCCT TTGCCAGGCT 
AACCTGACCA TCTTTAAGAT CCATTTTAAC 
GTGGTGGGAG TACCTGGGGG TCAGCAGGAT 
TGGGAGAAGC CTTCTAAGGA TGAGGCAGGA 
TTAGGGCCTT CAGAATTTTG GCTAAAGGCT 
TGGAGCCTTT ATTAAGGATG TTAGAATCCA 
TTGCCTCATG TGTCTCAAGC TGGTGGGAAT 
ATGGCAGGGA ACAGCCTATG CACAGGGGCA 
CAAAGCAGCC CGCCAGAAAA TGGACATTTA 
GTACTGATGA GCTTGCTTGA CTGATTATCC 
GACTCACTTC TCCTAGAGGA AGACCCTGTG 
GAGACCCTGA TGTGCCCAGA AAGCGTCAAC 
AGGGCCAAGC AGGGTGGCCC ATTTAAAGAG 
AAAGAGACAC CCTGTAAAAT ACTCCTATGA 
CTGTCTTTGG TTTGTTTTAC GGGGCCGGGC 
CTCTCCTCCT CCAATCTCAC CACACACCCT 
TTTTGTTTGT TTTGTTTTTT TAGCATCTTA 
TAAAATATGG CCAGTATAGG TGCATACAAA 
ATAAAAAGAA ACCAAAATGC CTATTATAGT 
ATTAACCCTG TCATTTTACC ATAACAACTT 
GTCCTTTTTC TTTGATGTTG TTGCAATTAT 
CCCACTTTCT ATCACAGAAG CACAATAAAT 
ACTGTGGCTT CAAAACATTT CAGTTGCCTG 
CCCCAAAAGT CTGTTGAGCT TTTCCTGGAA 
AGCAATGGAG ATTGGAGAAT ATGGTCATCT 
AACCTTAAAA AAAATCTCCT CAGTAGGGCT 
GAGTCTGAAA GATGAGAGCA ATTTTCAATC 
TCATCTAGTC ATGAGGCTGA CTAATGATAA 
ATTTGACTCT AAATTCAGCC TCTGTCTTGA 
TAGCTCACCT TTTAGTCATC TTAATTCCAC 
GATGGAGTAA AATTTGACTG CATGTGTATC 
GCTTGGAAGC TGCTGAGATG TGGTGTAAAG 
TTTAAGCTGG ACTCTACCAC TGGGTAGTTG 
GCATCTAAGT TTTCTCATCT GGAAAATGGA 
TAAAGATTAA ATTAACAAAG AATGTGAAAG 
TAGTAATTTG TGGAATGAAC ATCAAGGGAA 
CTGTTGGGTC AGAGTGAATG GATATTAAAT 
GTGTGTGTGT GTGTGTGTGT GGCGATCAAC 
AATGCAATAG GGTTTTTATT GGGAAGGTGG 
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TGAGAAGTTA TTGTCTTTTC TCCCTTCCTC 63960 

CAGATGTGAT TTACGAAGAC CTCTGGGTTA 64020 

CAGAATCTGC TAACACACTT GAGAAGCAAT 64080 

TTGAACCTGA TTTTCAGGGA TACCATTTTG 64140 

ATCGCTCTCT GGTAGAGTTG ACGTGACACT 64200 

GAGGCATTGG CTCCACTGCA GCCTGGGTGT 64260 

TACTGGGAAA CCGAACGCGG CGCTGTGGCT 64320 

GGATTGCCAT CTGAAGACAG AGGCAGGAGG 64380 

TCTCTTGCTC TATAACTGAA AGGAGGGCAT 64440 

ATCTAATTTT TAAGAATATG ATTTTAAAAT 64500 

TGCTGTTATT AAAAACGTGT TTACTTAAAT 64560 

AGCTGTGCAG GAAACAAACA GTGCAGTGAG 64620 

GGGTGGGACA TAGTTTCAAG CCAAATGACA 64680 

TAGAAATTTC CAGGGATCCC TCAGAGTTAT 64740 

GGGTGGAAAT ATTTCTTTAA GAGTCTTCCT 64800 

ATGGGAGCAC AGTGCAGGGA AGGCGGGATG 64860 

ACCTGGTCTA GCACTGGCAG AGCTGTGTGT 64920 

TTAGTTTCCT CCTCTAGGCA AAAGGGTTTG 64980 

CCTCAGATTC TGTGGCTGTG GTGATTGGGG 65040 

AAGCACGAAT CTGTGAGAGC TGAGAACAGG 65100 

AAGATTAGCA AGAGCCCTTA AATGGATTCT 65160 

ATACTAGTGG AGGTACTAAG ACCTGACACC 65220 

CTCCCATGAC AACATCCCAG CTTTGCCAAT 65280 

GTAGAAGTGG ATGAAACAGA CTGTTTTGTG 65340 

GGTGCTCTAC TGGTGTCTTC TATAAAACGC 65400 

GGCACTCGTG GTGTCTACTG AGTTTGTATG 65460 

ATGACTTACT GAGTAGATCG AACGTATGTG 65520 

GCTGCCCCAG CCACTGAGCA GCCTAACCTG 65580 

CTTGTATCTG GAGAAACCAG AACTTGCAAC 65640 

GCTCCTAGGG TTTTAATTGA CCTTGTTTTA 65700 

AAACTTATTT CACAAGCACC TAACCGCATT 65760 

CCCTTGTTCT GGTCAATTGG TCTGCATTAT 65820 

GGCCTCTGGG AGGCTTCCTC CCTTCTTTTT 65880 

GTTGTTACTA GGGGTACTTG CCTACTTATT 65940 

ATGTGCTTTC TGATTAAAAC AAAGCCAAAA 66000 

AGTTGGATTT TTAGACTAAC AGACCACCTC 66060 

ATTTTTATCT TTGTATGACC TTGTCTCAAT 66120 

GAACATCAAA TTTCATAGCT GCTTTTCCAC 66180 

AATCTTGGGG GCTGGGCTCT TGTTGGCCCA 66240 

TCCAGCCCTT TCTTAGCCTG ATACAACATC 66300 

TAAGAAGAGG GTCTTCTACT TTTTGAATAG 66360 

TGTGGAGGTT ATTCCAGGCT TCTTCTTAGG 66420 

GATGATATAT TCTGGACAAT AAGGTGAGCA 66480 

TTGTCATGAT TTCATCTAGT CAGCCTCATT 66540 

GACTTGCTTT GTCTTTGCAG TGTACTCTAG 66600 

TCATGCCCAC TTAGAAAATT AGAGTGCAGC 66660 

TAGGCAGAAG GCTGTGGGTC AAGGAATGTT 66720 

TGAAGGGGTA GGAGGCTAAG AGATTTTATG 66780 

AACACTGGAC TTAGAGTCCA GACACCTGAG 66840 

AATGACTTTG AGTGAGTTAT ATAAGCTCTA 66900 

GTTAATAACA TCTACTGCAT TGGGCTGTTG 66960 

CACCTGAACA AAAGCTTGTG AGTAAATAAT 67020 

GTCTTCAATT TGGGTGTTTT CAGTGAGTTT 67080 

TCTGGGATTT TGGTTTGTGT GTGTGTGTGT 67140 

ATTGGTTCTT CACTGTGACC TTAGGAAAGA 67200 

GTAGCAGGGA GATGCATGAA CCATATTAAG 67260 
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GGGGGACCTC CAAATTGAAC CTTGTTTTGA 
TGGGAGACCT GGGGTGACTT GGGGTGATTG 
TGATGCCTGG CAAGTAGGGA CCTGCAGAAA 
ACTAGCTTTT AGGAGTGAGT ATACCGTTGC 
GATGTCTACT GACCACTGAT GCTGAGGTCA 
AAAAAAATTC TACTTACTTC TTTTGCCCAG 
TTCAAATGAG GTTTCTGGTC ATGCAAATGT 
TAGTTCTTAT TATTAGTAAA ACAGCAAGGA 
GGGTTTCTGT GAACTTGGAT GTAAAAAAAA 
GAATTTAACA TTTTCTTTCA ATATGAATGT 
ATTGTGACTA TCACCAGGAT AAATCACATT 
CACAAAAAGT GGGTATTTGA TATCAAGTTA 
TGTATTAACA AGGAAGCACA TATATTGTTA 
TGCCATTAAA AATAATTACA AAAACAGCCA 
ACTAAAAATA CAGGTGTGGT AGCACACACC 
AGGAGAATCA TTTGAACCTG GGAAGCAGAG 
CTCTAGCCTG AGCAACAGAG TGAGACTCTG 
TCTGTAATTA CTTTTGCACC AACATAATAT 
TGACATTGTC TTTTAATATA AATTTTTTTA 
AAATATTATT TTGAGAAGAG GCCTGTAGGC 
CAGATGAAAG GTTAAGAACA CCTCATTTAC 
ACTTCTGAAC AGCGCCAGTT GATAGATGTT 
TATGTTCCTG CCTATGGGTA GCCTTGGAGT 
TAATAAGAAT GATAATGGTA ATTTGTTGAG 
TGTTCAATGT CATGTTTAGT TTAATCCTCC 
AAGACAGAAG AGATCCCCTG CCCACCCACG 
AAGTGGGGAC TGACCCTTTG GAGATGGCTG 
ACGGGGCCCC ATGCTTAGAA AGGTTCCATG 
AAAATTCTTC GTAAGTTTTG AACAAAGGGC 
AAATGATGTA GCTGGTCCTG CCTCTGGTTA 
GACTGGGAGT CTGGGAAGCT GCTGCGGAGA 
ACGCAACCTC TGGTGGAAAG CTATGGCCTG 
GTGAGTTTTC CAGTTTTGTT TGAGCTCCAA 
TGAATAATTG GCTGAAAGGG ATTTAATCCC 
GCAATACTGG TGGGTTTTTC ATGATTTTAT 
AGCCAGGAGA AACAGAATAA AGTCTGTTCT 
AGTCTAGTGT TCTCATAGGA AGGCTCTAAA 
TTATCTCTAT AAAAGGAAAC TTTACTTCTT 
AATTCAGATG TTATTTGCTT CTTAAGCTGC 
TTCTGTTGGT GTTTTAGGGC TAAGTTTTTA 
TGGACGTAGT GATGGTGGAA GAGGGGAAGA 
GAATTCAGGC TCCCCCAATA ACATATTCAT 
TAACACAAGC AAGCCTGTCA TCAGGACCAA 
CCAAAAGAGC CATAATTCAC CCTTCATTTC 
TGCTCTTTTA TCATGGTTAA TAAGTCTGAG 
AGTCAATCTG GGGCCTCTAA CAGGAAGCCA 
GCTGTGAACT GTGTTTCTGT AGCATCCAGA 
TAGGGTACAA GGATGATCAA TATGTCTCCT 
TCGTGCTTAA AGGATCTCAA CTTTGAAATT 
TTCTCTTTAG AAACACAAGG ATCCATTTTC 
CCCACCCCTT CCCAAGTACA CTTATTATAA 
TTTAGAAAAG TTGCTTAACC TCTCTCTGTA 
GGGTTATTGT GAGGATTAAA TGAGGTAATT 
ATTTAAACCC TCAACAAATA TGAACTATTA 
AATTATATAA TCTTCAAGAT TCTGAGATGG 
CTTGGTCTTT TCCTAGGAGG AAACTTGACT 
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GTCAACTGCA AACCACAACC AAGGAGGTCC 67320 

GGTATGCAGC ACATTCCTGT TCTTGTGTCC 67380 

ATACTGATTC TCCTCCAGGC AGTTCACATG 67440 

CCACCCCTAA AATTCTTGAT CATGTCTCCA 67500 

TGAATCTTGG GCATTCTAGA GGCTTTGGGA 67560 

ACACTCTGGG GTCTACCTCT TGGTAAATTA 67620 

GGTTTCTAGA GCCTATTTGA ATTGAACAAG 67680 

TCCCTAACTT GGGGTCCAAG GGTAAATTCA 67740 

ATTGTGTTTA TTTTCAATAA TCTCTAACTA 67800 

AGGCAAAACT CCATGGTAGT ATTAGCTGCA 67860 

TTCATGTCTT ATTACACCTA TTACATATAT 67920 

GATCTGCACT AGGTAGATAT TCTTATTTAA 67980 

TCAGGTTGGT GCAAAAGTAA TTGTGGTTCT 68040 

GTCTGGCCAA CATGGCGAAA CCCCATCTCT 68100 

TGTAATCCCA GCTACTTGGG AGGCTGAGGC 68160 

GCTGCAGTGA GCCAAGATCA CACCACTGCA 68220 

TCTCAAAAAA ATTAAAAAAT AAAAAAAAAC 68280 

GATATCACAC ATTTATTTTA AAAAGTATTT 68340 

AATCTTATAA TATTTTAATT TGTCATGTAA 68400 

CTCACTAGAT TACAAAACAG ATCCATCGTA 68460 

AGCATTCTCT CACACACGAC TAACGAAATG 68520 

CTCTGCCAAA AGGGGAATAT GATCTTCCCA 68580 

TGTGAAGGGA CTTTGGCATA ATGAAGATGA 68640 

TGCCTGCTGT AAGCCAGGTG GTTACAGTCC 68700 

CAATGACCTC AGGAGGTAGT GCATGGAACA 68760 

GTAATGAAAC ATGGGTACAG GTGAAGGCAA 68820 

ATGTCACGAG TGTGCAACCT GTGCAGTTCC 68880 

TTTGGTTTAA GGCTCTGCTG TTGCCATCTT 68940 

CCTGCATGTT CCTTTTACAC TGAGCTCTGC 69000 

TGGTGAAATG GAATGTATGA CAACTCCTGA 69060 

GCCCTCTCCT CATTTTCATC AGGCTCAGCT 69120 

TTGAGGAGGG AGGATGTCGT TTTTGAGTTA 69180 

AGCTTTCCTC CAAACAACTG GAAAGATGGC 69240 

TTGAAAAACC TTTCTGGTAG GGAGTTGCTG 69300 

TTTACAGAGG GCTTGCTACG TAAACCAGTG 69360 

GGAAGGAAAA ATGAGACCTG GTGTGCCACG 69420 

AACAAACTCA GCTTTCCTGC TATTGAATGA 69480 

CTAAAGGAGA GGTCGTCTAA TTTGTGAGAA 69540 

AAGGATGCTA ATGAAATAAT TCTCATGAAG 69600 

TAGACTGTTC CAAAATTCAA AACAGGGATG 69660 

CTTTTCCTCG ATTTCTTTGC CTGAGGGATG 69720 

GGTCTTTCTC TGGTCAGTCA GTGATGTTCA 69780 

TCTGTGATGG CTGAGACATC AGGTGCTCTT 69840 

CCAAGGTTTT TTTTTTCTTG CTGTTATTAC 69900 

GTGGCTTCAG ACAGCCAGTC CTAACCCCTG 69960 

GACTGAAGTT CTGATAGATG GGTTTGAGTG 70020 

CTGATTTGCA CTGAAAGGGA GCTTCCATAT 70080 

GTTTATATTT GGTGGAAAAA GTTGTGGGAA 70140 

AAAAGTATAA CGTCCTAACA GACATCCTCC 70200 

AAGTAATTTC AAAAGAACTA TGTTGCTTTC 70260 

TATATCCAGT CCATTTGCTA GCTTTGTGTC 70320 

AAATGGTGCT TATATTAGTA CTAACATTCA 70380 

CATGTAATGA CTAGTTCTAT TTCTAGCACA 70440 

TCACTGTCAT AGTTTTTGTT GTTGTTTTCT 70500 

GGGCTGTTGC TCTTTCCTTG ACTTGAACAT 70560 

CTTGAAATGG TCAAATCCAT TGTCCTAGTT 70620 
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CATCCTGACC 
TCCTGCACAG 
TTTTCTGCTG 
GCCTGGGAAC 
CCTGGAGAGG 
GTGGTCTGGT 
GGGGAAATGT 
TATATTATTT 
GGTGTGCAGT 
TCCTGCCTCA 
TTGTATATTT 
GACCTCAAGT 
CTGCGCCCGG 
AGTGCTGATG 
GCTCCTTGTG 
TGAGCATGTG 
TCCACCTGGG 
GATTTTGAGG 
CAGGGGTTTT 
CTTGACTGAG 
TACTCCCTAA 
TCGATCGTAT 
GGGCTGGCTG 
CAGAACCAGC 
TGGCCTATTT 
ACGCAGGTTT 
CAGCTCAGGC 
TCAATGCTTA 
GCCAACTCAA 
AGCCCCTGCC 
GCTCCTGACC 
AGACAACCCA 
TGAAACTCAG 
AGGCAAGGGC 
AATTCTTTTC 
CTTGCCCCAT 
CCACTATTAT 
TGTTTTGTTA 
TGGTAAATAT 



CCTCCCTGGC 
TTGGTCTTAT 
CCCGATTTGG 
GCTCAGTCAG 
CAACATTTCT 
AGTAATGAAC 
TGCATGTTGC 
TATTTTTATT 
GGCACAGTCA 
GCCTCCCAAG 
TTAGTAGAGA 
GATATGCCCG 
CCAAAAAGAG 
CCATTGCTTA 
GCCCTTTTTG 
ATGATGCTTG 
ACCTGAGACC 
TAGGTGTGGC 
CAAATTATGC 
ATCTGGGTAC 
ACTGTATTGA 
CCCAGCAGTC 
GAGGCTGGCC 
CAATGGAAAA 
GATACCCCCT 
TTCTTTGGTT 
TGTTTGGGCA 
ACTGCGTTGT 
ATGAAAAGGT 
AGCAACTGAA 
CTCCTACACA 
CTGCTTTATT 
AAGCATTATC 
ATGGGGTTGC 
AAGGTTGGTC 
TCCATTCACA 
TTTGGAAGCC 
ACTATTGTAT 
GTATGAATTA 



TCCAATCCCC 
TTATTTTTCA 
TTCTAAGCAC 
CCATATCCCT 
GAGGCCCCAC 
CCCCAACGGC 
CTTGACTGTG 
TTTATTTTTG 
TGGCTCACTG 
TAGCTGGGAC 
CGGGGTTTCA 
CCTCGGACTC 
ATATTCAAAA 
ATTAATGTTG 
GACTTAGCTT 
GATTAGGACA 
TCAAATCTCT 
TAGCCTCATT 
TTTTTGCCCA 
ACTGAGCCTC 
CTTTTGGAAG 
TTTTCTCTGC 
ATGTCACTTT 
GTACAACGTC 
TTTATATTGA 
TTTATAGGAA 
GATGCCTTTC 
TATCGGAGCA 
AGGGCTTATA 
TTAACAGCTC 
TAAACACACC 
CTGCCCTGAG 
CTCTCTGCCA 
CGGAGAGAAG 
ATTCTCAAAT 
ACAGGAGGTG 
GTGTCCTTGT 
CGCCAGCACC 
CAGAGGGT 



ACCCCTTACC 
GTCAACTAAG 
TCCACTCCCT 
CCCCTGTCGG 
TGCCATAAGC 
CCGGAGAAAA 
CTTTCTTCTA 
AGATGGAGTC 
CAACCTCCAC 
TACAGGCACA 
CCATATTAAC 
CCAAAGTGCT 
GCTCCCTCTG 
TTCATGATCT 
ATCATGTGAC 
GAAATCACAT 
TGGCAGGAGA 
TATGCGATGG 
GGGCTGGATG 
CACTTTAGGA 
TCAACCATTT 
CCTTGTTAAT 
GCAGACCATG 
TTCTGGCTTC 
GGGAGTGAAA 
AAACAAATTG 
TTTGCTTTTT 
GAGCAACAGG 
CCCTCTGGGA 
TGTTTACGGT 
ATTGTCTCAG 
TGGAGATTGG 
ACTCCACGTC 
AGGATTGGTC 
GCCAGAGAGG 
GGGAATGAGC 
GAATAGTCCA 
TAGCAAAGTG 



GTCCTCCACC 
GGTGTTGTTA 
ACGCTGCTCA 
AACACCCAGT 
CCCCCTCCCC 
CTGGGGCAAG 
CAAAGCTTAA 
TTACTTGGTT 
CTCCTGGGTT 
TGCCACCATG 
TACATTGGTC 
GGGATTAAAA 
ACTGTGTGTG 
CCATTTGGGC 
ATTGACAAAT 
CTAGGACATC 
TGAGTGGGTC 
GAAAACTGTG 
TAGGATGTCT 
GGTAACCTAG 
AGAAGAGTGT 
CTGATTCATG 
GACACCCCTG 
TCAGCCTTGC 
ATGTAGCATC 
GCATGAACAC 
TCTGTTTATT 
TGCAAAAAAA 
GGTATTCAGA 
GGGTTTTATG 
AGAGAGACAT 
TTTTGGCTCA 
CTAGTCAGAG 
CTGCTTTTAA 
GTTGCCCGGC 
TCAGATGACT 
TCAGGGTAGG 
CCCAGCATCT 



CTTCCTACAT 
AATCTTTTAT 
TAACAAGAAT 
TCTTAATGCT 
CATGAAGCCA 
GTGTTTGTCT 
AAAGAGATAT 
GCCCAGGCTG 
CAAGTGATTC 
CCTGGCTAAT 
TTGAACTCCT 
GCATGAGCCA 
CTGAAGGCTG 
GATTTGTTTA 
TAATGAGAAG 
TCAGGCCCTT 
TACACAGCCC 
GTCCGGGAAC 
GGGGGAGAGG 
AGACTACACC 
GGTTTTGGTT 
ATCTGAACCT 
AGTGCCCTCA 
CATCTCCCTC 
CAAACTGAAA 
TCAGTCAAAC 
TTCCTACAAA 
TAACTCTGCT 
AGATAACAGA 
TTAACAACCT 
TCAGCCATCC 
GGCTGCTTTG 
TTTTCTGTGA 
GCCTAGCTGA 
TCTCTCTGCT 
TTGGAAGGAG 
GCAGCGTCTA 
AGTAGACACT 



70680 
70740 
70800 
70860 
70920 
70980 
71040 
71100 
71160 
71220 
71280 
71340 
71400 
71460 
71520 
71580 
71640 
71700 
71760 
71820 
71880 
71940 
72000 
72060 
72120 
72180 
72240 
72300 
72360 
72420 
72480 
72540 
72600 
72660 
72720 
72780 
72840 
72900 
72928 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5427 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ATTTATCTTC ACTCTGATGA GGGCTCAGAC TTGATAACGC CCGTGGTGCC CCATCCCTAT 
AGGAGCTGGT GAGATTGCAG CCTGCTGCCT CCCCTCCATC AGCCACAGCT ATTGGATTTC 
CCACCCAGAA TCTTTAGGTA AATGAGATCA TGATTCTGGA AGGAGGTGGT GTAATGAATC 
TCAACCCCGG CAACAACCTC CTTCACCAGC CGCCAGCCTG GACAGACAGC TACTCCACGT 



60 
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180 
240 
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GCAATGTTTC CAGTGGGTTT TTTGGAGGCC AGTGGCATGA AATTCATCCT CAGTACTGGA 300 

CCAAGTACCA GGTGTGGGAG TGGCTCCAGC ACCTCCTGGA CACCAACCAG CTGGATGCCA 360 

ATTGTATCCC TTTCCAAGAG TTCGACATCA ACGGCGAGCA CCTCTGCAGC ATGAGTTTGC 420 

AGGAGTTCAC CCGGGCGGCA GGGACGGCGG GGCAGCTCCT CTACAGCAAC TTGCAGCATC 480 

TGAAGTGGAA CGGCCAGTGC AGTAGTGACC TGTTCCAGTC CACACACAAT GTCATTGTCA 540 

AGACTGAACA AACTGAGCCT TCCATCATGA ACACCTGGAA AGACGAGAAC TATTTATATG 600 

ACACCAACTA TGGTAGCACA GTAGATTTGT TGGACAGCAA AACTTTCTGC CGGGCTCAGA 660 

TCTCCATGAC AACCACCAGT CACCTTCCTG TTGCAGAGTC ACCTGATATG AAAAAGGAGC 720 

AAGACCCCCC TGCCAAGTGC CACACCAAAA AGCACAACCC GAGAGGGACT CACTTATGGG 780 

AATTCATCCG CGACATCCTC TTGAACCCAG ACAAGAACCC AGGATTAATA AAATGGGAAG 840 

ACCGATCTGA GGGCGTCTTC AGGTTCTTGA AATCAGAGGC AGTGGCTCAG CTATGGGGTA 900 

AAAAGAAGAA CAACAGCAGC ATGACCTATG AAAAGCTCAG CCGAGCTATG AGATATTACT 960 

ACAAAAGAGA AATACTGGAG CGTGTGGATG GACGAAGACT GGTATATAAA TTTGGGAAGA 1020 

ATGCCCGAGG ATGGAGAGAA AATGAAAACT GAAGCTGCCA ATACTTTGGA CACAAACCAA 1080 

AACACACACC AAATAATCAG AAACAAAGAA CTCCTGGACG TAAATATTTC AAAGACTACT 1140 

TTTCTCTGAT ATTTATGTAC CATGAGGGGA AAAAGAAACT ACTTCTAACG GGAAGAAGAA 1200 

ACACTACAGT CGATTAAAAA AATTATTTTG TTACTTCGAA GTATGTCCTA TATGGGGAAA 1260 

AAACGTACAC AGTTTTCTGT GAAATATGAT GCTGTATGTG GTTGTGATTT TTTTTCACCT 1320 

CTATTGTGAA TTCTTTTTCA CTGCAAGAGT AACAGGATTT GTAGCCTTGT GCTTCTTGCT 1380 

AAGAGAAAGA AAAACAAAAT CAGAGGGCAT TAAATGTTTT GTATGTGACA TGATTTAGAA 1440 

AAAGGTGATG CATCCTCCTC ACATAAGCAT CCATATGGCT TCGTCAAGGG AGGTGAACAT 1500 

TGTTGCTGAG TTAAATTCCA GGGTCTCAGA TGGTTAGGAC AAAGTGGATG GATGCCGGGA 1560 

AGTTTAACCT GAGCCTTAGG ATCCAATGAG TGGAGAATGG GGACTTCCAA AACCCAAGGT 1620 

TGGCTATAAT CTCTGCATAA CCACATGACT TGGAATGCTT AAATCAGCAA GAAGAATAAT 1680 

GGTGGGGTCT TTATACTCAT TCAGGAATGG TTTATCTGAT GCCAGGGCTG TCTTCCTTTC 1740 

TCCCCTTTGG ATGGTTGGTG AAATACTTTA ATTGCCCTGT CTGCTCACTT CTAGCTATTT 1800 

AAGAGAGAAC CCAGCTTGGT TCTTTTTTGC TCCAAGTGCT TAAAAATAAG TTGGAAAAAG 1860 

GAGACGGTGG TGTGGAAATG GCTGAAGAGT TTGCTCTTGT ATCCCTATAG TCCAAGGTTT 1920 

CTCAATCTGC ACAATTGACA TTTTTGGCCG GAGTGTTCTT TGTGGTGAGG GCTTTCCTGT 1980 

GCATTGTAAG ATGTTCAGCA GTATCCACTC ATGGTCTCTA ACCACTTGAC ACCAGAAACC 2040 

CCCCAGCTGT GATAACGCAA AATGTCTCTA GACATCACCA AATGTTCCCT GGGGGTGGCA 2100 

AATTTGCCCT TGATTGAGAA CCACCAGTTT AGCTAGTCAA TATGAGGATG GTGGTTTATT 2160 

CTCAGAAGAA AAAGATATGT AAGGTCTTTT AGCTCCTTAG AGTGAAGCAA AAGCAAGACT 2220 

TCAACCTCAA CCTATCTTTA TGTTTTAAAT ATTAGGGACA ATAAGTTGAA ATAGCTAGAG 2280 

GAGCTTCTTT TCAGAACCCC AGATGAGAGC CAATGTCAGA TAAAGTAAGC ATAGCAATGT 2340 

AGCAGGAACT ACAATAGAAG ACATTTTCAC TGGAATTACA AAGCAGAATT AAAATTATAT 2400 

TGTAGAAGGA AACACCAAGA AAAGAATTTC CAGGGAAAAT CCTCTTTGCA GGTATTAATT 2460 

CTTATAATTT TTTGTCTTTT GGATTATCTG TTTACTGTCT CATCTGAACT GATCCCAGGT 2520 

GAACGGTTTA TTGCCTAGAT TTGTACTCAG AGGAATTTTT TTTGTTTTGT TTTGTCTTTT 2580 

AAGAAAGGAA AGAAAGGATG AAAAAAATAA ACAGAAAACT CAGCTCAGGC ACAATTGTCA 2640 

CCAAGGAGTT AAAAGCTTCT TCTTCAATAG AGGAATTGTT CTGGGGGTCC TGGAGACTTA 2700 

CCATTGAGCC ATGCAATCTG GGAAGCACAG GAATAAGTAG ACACTTTGAA AATGGATTTG 2760 

AATGTTCTCA TCCCTTTTGC AGCTTTTCTT TTTGGCTCTC TCATGTCCTT GGCTTGCTCC 2820 

TCTATTCTAC CTCTCTTTCT CCAGCAATAA TATGCAAATG AAGACATGTA TCCATAAGAA 2880 

GGAGTGCTCT TCATCAACTA ATAGAGCACC TACCACAGTG TCATACCTGG TAGAGGTGAG 2940 

CAATTCATAT TCAAAGGTTG CAAAGTGTTT GTAATATATT CATGAGGCTG GAAGTAAGAA 3000 

GAATTAAAAA TTTGTCCTAA TTACAATGAG AACCATTCTA GGTAGTGATC TTGGAGCACA 3060 

CATGAATAAC TTTCTGAAGG TGCAACCAAA TCCATTTTTA TTTCTGCCTG GCTTGGTCAC 3120 

CTCTGTAAAG GTTTAACTTA GTGTTGTCAA GTAACAGTTA CTGAAAGAGC TGAGAAAAAG 3180 

AACAATGAAC AGCAACGATC TTGACTGTGC AACTCAGACA TTCCTGCAGA AAAGACATAT 3240 

GTTGCTTTAC AAGAAGGCCA AAGAACTATG GGGCCTTCCC AGCATTTGAC TGTTCATTGC 3300 

ATAGAATGAA TTAAATATCC AGTTACTTGA ATGGGTATAA CGCATGAATA TTTGTGTGTC 3360 

TGTGTGTGTG TCTGAGTTGT GTGATTTTAT TAGGGGCATC TGCCAATTCT CTCACTGTGG 3420 

TTCCTTCTCT GACTTTGCCT GTTCATCATC TAAGGAGGCT AGATCCTTCG CTGACTTCAC 3480 

CATTCCTCAA ACCTGTAAGT TTCTCACTTC TTCCAAATTG GCTTTGGCTC TTTCTTCAAC 3540 

CTTTCCATTC. AAGAGCAATC TTTGCTAAGG AGTAAGTGAA TGTGAAGAGT ACCAACTACA 3600 
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ACAATTCTAC AGATAATTAG TGGATTGTGT TGTTTGTTGA GAGTGAAGGT TTCTTGGCAT 3660 

CTGGTGCCTG ATTAAGGCTT GAGTATTAAG TTCTCAGCAT ATCTCTCTAT TGTCTTGACT 3720 

TGAGTTTGCT GCATTTTCTA TGTGCTGTTC GTGACTTGGA GAACTTAAAG TAATCGAGCT 3780 

ATGCCAACTT GGGGTGGTAA CAGAGTACTT CCCACCACAG TGTTGAAAGG GAGAGCAAAG 3840 

TCTTATGGAT AAACCCTCCT TTCTTTTGGG GACACATGGC TCTCACTTGA GAAGCTCACC 3 900 

TGTGCTGAAT GTCCACATGG TCACTAAACA TGTTATCCTT AAACCCCCCG TATGCCTGAG 3960 

TTGAAAGGGC TCTCTCTTAT TAGGTTTTCA TGGGAACATG AGGCAGCAAA TCTATTGCTA 4020 

AGACTTTACC AGGCTCAAAT CATCTGAGGC TGATAGATAT TTGACTTGGT AAGACTTAAG 4080 

TAAGGCTCTG GCTCCCAGGG GCATAAGCAA CAGTTTCTTG AATGTGCCAT CTGAGAAGGG 4140 

AGACCCAGGT TATGAGTTTT CCTTTGAACA CATTGGTCTT TTCTCAAAGT TCCTGCCTTG 4200 

CTAGACTGTT AGCTCTTTGA GGACAGGGAC TATGTCTTAT CAATCACTAT TATTTTCCTG 4260 

TTACCTAGCA TGGGACAAGT ACACAACACA TATTTGTTCA ATGAATGAAT GAATGTCTTC 4320 

TAAAAGACTC CTCTGATTGG GAGACCATAT CTATAATTGG GATGTGAATC ATTTCTTCAG 4380 

TGGAATAAGA GCACAACGGC ACAACCTTCA AGGACATATT ATCTACTATG AACATTTTAC 4440 

TGTGAGACTC TTTATTTTGC CTTCTACTTG CGCTGAAATG AAACCAAAAC AGGCCGTTGG 4500 

GTTCCACAAG TCAATATATG TTGGATGAGG ATTCTGTTGC CTTATTGGGA ACTGTGAGAC 4560 

TTATCTGGTA TGAGAAGCCA GTAATAAACC TTTGACCTGT TTTAACCAAT GAAGATTATG 4620 

AATATGTTAA TATGATGTAA ATTGCTATTT AAGTGTAAAG CAGTTCTAAG TTTTAGTATT 4680 

TGGGGGATTG GTTTTTATTA TTTTTTTCCT TTTTGAAAAA TACTGAGGGA TCTTTTGATA 4740 

AAGTTAGTAA TGCATGTTAG ATTTTAGTTT TGCAAGCATG TTGTTTTTCA AATATATCAA 4800 

GTATAGAAAA AGGTAAAACA GTTAAGAAGG AAGGCAATTA TATTATTCTT CTGTAGTTAA 4860 

GCAAACACTT GTTGAGTGCC TGCTATGTGC ACGGCATGGG CCCATATGTG TGAGGAGCTT 4920 

GTCTAATTAT GTAGGAAGCA ATAGATCTCG GTAGTTACGT ATTGGGCAGA TACTTACTGT 4980 

ATGAATGAAA GAACATCACA GTAATCACAA TATCAGAGCT GAATTATCCT CAGTGTAGCT 5040 

TCTTGGAATT CAGTTTCTGG AACTAGAGAT AGAGCATTTA TTAAAAAAAA CTCCTGTTGA 5100 

GACTGTGTCT TATGAACCTC TGAAACGTAC AAGCCTTCAC AAGTTTAACT AAATTGGGAT 5160 

TAATCTTTCT GTAGTTATCT GCATAATTCT TGTTTTTCTT TCCATCTGGC TCCTGGGTTG 5220 

ACAATTTGTG GAAACAACTC TATTGCTACT ATTTAAAAAA AATCAGAAAT CTTTCCCTTT 5280 

AAGCTATGTT AAATTCAAAC TATTCCTGCT ATTCCTGTTT TGTCAAAGAA TTATATTTTT 5340 

CAAAATATGT TTATTTGTTT GATGGGTCCC AGGAAACACT AATAAAAACC ACAGAGACCA 5400 

GCCTGGAAAA AAAAAAAAAA AAAAAAA 5427 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5510 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATCGCTCTCT GGTAGAGTTG ACGTGACACT CATTTCTGTT GTGGGTGGGG CCCTGGTTGG 60 

GAGGCATTGG CTCCACTGCA GCCTGGGTGT CTAGAGACCA CATTCTCACC CTGCCTTTGT 120 

TACTGGGAAA CCGAACGCGG CGCTGTGGCT TTCAGCTTGG GTAAGCCGGG TCTGCGGCGG 180 

GGATTGCCAT CTGAAGACAG AGGCAGGAGG GCAGCCACAC CTTGCCCAGA TCATGATTCT 240 

GGAAGGAGGT GGTGTAATGA ATCTCAACCC CGGCAACAAC CTCCTTCACC AGCCGCCAGC 300 

CTGGACAGAC AGCTACTCCA CGTGCAATGT TTCCAGTGGG TTTTTTGGAG GCCAGTGGCA 360 

TGAAATTCAT CCTCAGTACT GGACCAAGTA CCAGGTGTGG GAGTGGCTCC AGCACCTCCT 420 

GGACACCAAC CAGCTGGATG CCAATTGTAT CCCTTTCCAA GAGTTCGACA TCAACGGCGA 480 

GCACCTCTGC AGCATGAGTT TGCAGGAGTT CACCCGGGCG GCAGGGACGG CGGGGCAGCT 540 

CCTCTACAGC AACTTGCAGC ATCTGAAGTG GAACGGCCAG TGCAGTAGTG ACCTGTTCCA 600 

GTCCACACAC AATGTCATTG TCAAGACTGA ACAAACTGAG CCTTCCATCA TGAACACCTG 660 

GAAAGACGAG AACTATTTAT ATGACACCAA CTATGGTAGC ACAGTAGATT TGTTGGACAG 720 
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CAAAACTTTC TGCCGGGCTC AGATCTCCAT GACAACCACC AGTCACCTTC CTGTTGCAGA 780 

GTCACCTGAT ATGAAAAAGG AGCAAGACCC CCCTGCCAAG TGCCACACCA AAAAGCACAA 840 

CCCGAGAGGG ACTCACTTAT GGGAATTCAT CCGCGACATC CTCTTGAACC CAGACAAGAA 900 

CCCAGGATTA ATAAAATGGG AAGACCGATC TGAGGGCGTC TTCAGGTTCT TGAAATCAGA 960 

GGCAGTGGCT CAGCTATGGG GTAAAAAGAA GAACAACAGC AGCATGACCT ATGAAAAGCT 1020 

CAGCCGAGCT ATGAGATATT ACTACAAAAG AGAAATACTG GAGCGTGTGG ATGGACGAAG 1080 

ACTGGTATAT AAATTTGGGA AGAATGCCCG AGGATGGAGA GAAAATGAAA ACTGAAGCTG 1140 

CCAATACTTT GGACACAAAC CAAAACACAC ACCAAATAAT CAGAAACAAA GAACTCCTGG 1200 

ACGTAAATAT TTCAAAGACT ACTTTTCTCT GATATTTATG TACCATGAGG GGAAAAAGAA 1260 

ACTACTTCTA ACGGGAAGAA GAAACACTAC AGTCGATTAA AAAAATTATT TTGTTACTTC 1320 

GAAGTATGTC CTATATGGGG AAAAAACGTA CACAGTTTTC TGTGAAATAT GATGCTGTAT 1380 

GTGGTTGTGA TTTTTTTTCA CCTCTATTGT GAATTCTTTT TCACTGCAAG AGTAACAGGA 1440 

TTTGTAGCCT TGTGCTTCTT GCTAAGAGAA AGAAAAACAA AATCAGAGGG CATTAAATGT 1500 

TTTGTATGTG ACATGATTTA GAAAAAGGTG ATGCATCCTC CTCACATAAG CATCCATATG 1560 

GCTTCGTCAA GGGAGGTGAA CATTGTTGCT GAGTTAAATT CCAGGGTCTC AGATGGTTAG 1620 

GACAAAGTGG ATGGATGCCG GGAAGTTTAA CCTGAGCCTT AGGATCCAAT GAGTGGAGAA 1680 

TGGGGACTTC CAAAACCCAA GGTTGGCTAT AATCTCTGCA TAACCACATG ACTTGGAATG 1740 

CTTAAATCAG CAAGAAGAAT AATGGTGGGG TCTTTATACT CATTCAGGAA TGGTTTATCT 1800 

GATGCCAGGG CTGTCTTCCT TTCTCCCCTT TGGATGGTTG GTGAAATACT TTAATTGCCC 1860 

TGTCTGCTCA CTTCTAGCTA TTTAAGAGAG AACCCAGCTT GGTTCTTTTT TGCTCCAAGT 1920 

GCTTAAAAAT AAGTTGGAAA AAGGAGACGG TGGTGTGGAA ATGGCTGAAG AGTTTGCTCT 1980 

TGTATCCCTA TAGTCCAAGG TTTCTCAATC TGCACAATTG ACATTTTTGG CCGGAGTGTT 2040 

CTTTGTGGTG AGGGCTTTCC TGTGCATTGT AAGATGTTCA GCAGTATCCA CTCATGGTCT 2100 

CTAACCACTT GACACCAGAA ACCCCCCAGC TGTGATAACG CAAAATGTCT CTAGACATCA 2160 

CCAAATGTTC CCTGGGGGTG GCAAATTTGC CCTTGATTGA GAACCACCAG TTTAGCTAGT 2220 

CAATATGAGG ATGGTGGTTT ATTCTCAGAA GAAAAAGATA TGTAAGGTCT TTTAGCTCCT 2280 

TAGAGTGAAG CAAAAGCAAG ACTTCAACCT CAACCTATCT TTATGTTTTA AATATTAGGG 2340 

ACAATAAGTT GAAATAGCTA GAGGAGCTTC TTTTCAGAAC CCCAGATGAG AGCCAATGTC 2400 

AGATAAAGTA AGCATAGCAA TGTAGCAGGA ACTACAATAG AAGACATTTT CACTGGAATT 2460 

ACAAAGCAGA ATTAAAATTA TATTGTAGAA GGAAACACCA AGAAAAGAAT TTCCAGGGAA 2520 

AATCCTCTTT GCAGGTATTA ATTCTTATAA TTTTTTGTCT TTTGGATTAT CTGTTTACTG 2580 

TCTCATCTGA ACTGATCCCA GGTGAACGGT TTATTGCCTA GATTTGTACT CAGAGGAATT 2640 

TTTTTTGTTT TGTTTTGTCT TTTAAGAAAG GAAAGAAAGG ATGAAAAAAA TAAACAGAAA 2700 

ACTCAGCTCA GGCACAATTG TCACCAAGGA GTTAAAAGCT TCTTCTTCAA TAGAGGAATT 2760 

GTTCTGGGGG TCCTGGAGAC TTACCATTGA GCCATGCAAT CTGGGAAGCA CAGGAATAAG 2820 

TAGACACTTT GAAAATGGAT TTGAATGTTC TCATCCCTTT TGCAGCTTTT CTTTTTGGCT 2880 

CTCTCATGTC CTTGGCTTGC TCCTCTATTC TACCTCTCTT TCTCCAGCAA TAATATGCAA 2940 

ATGAAGACAT GTATCCATAA GAAGGAGTGC TCTTCATCAA CTAATAGAGC ACCTACCACA 3000 

GTGTCATACC TGGTAGAGGT GAGCAATTCA TATTCAAAGG TTGCAAAGTG TTTGTAATAT 3060 

ATTCATGAGG CTGGAAGTAA GAAGAATTAA AAATTTGTCC TAATTACAAT GAGAACCATT 3120 

CTAGGTAGTG ATCTTGGAGC ACACATGAAT AACTTTCTGA AGGTGCAACC AAATCCATTT 3180 

TTATTTCTGC CTGGCTTGGT CACCTCTGTA AAGGTTTAAC TTAGTGTTGT CAAGTAACAG 3240 

TTACTGAAAG AGCTGAGAAA AAGAACAATG AACAGCAACG ATCTTGACTG TGCAACTCAG 3300 

ACATTCCTGC AGAAAAGACA TATGTTGCTT TACAAGAAGG CCAAAGAACT ATGGGGCCTT 3360 

CCCAGCATTT GACTGTTCAT TGCATAGAAT GAATTAAATA TCCAGTTACT TGAATGGGTA 3420 

TAACGCATGA ATATTTGTGT GTCTGTGTGT GTGTCTGAGT TGTGTGATTT TATTAGGGGC 3480 

ATCTGCCAAT TCTCTCACTG TGGTTCCTTC TCTGACTTTG CCTGTTCATC ATCTAAGGAG 3540 

GCTAGATCCT TCGCTGACTT CACCATTCCT CAAACCTGTA AGTTTCTCAC TTCTTCCAAA 3600 

TTGGCTTTGG CTCTTTCTTC AACCTTTCCA TTCAAGAGCA ATCTTTGCTA AGGAGTAAGT 3660 

GAATGTGAAG AGTACCAACT ACAACAATTC TACAGATAAT TAGTGGATTG TGTTGTTTGT 3720 

TGAGAGTGAA GGTTTCTTGG CATCTGGTGC CTGATTAAGG CTTGAGTATT AAGTTCTCAG 3780 

CATATCTCTC TATTGTCTTG ACTTGAGTTT GCTGCATTTT CTATGTGCTG TTCGTGACTT 3840 

GGAGAACTTA AAGTAATCGA GCTATGCCAA CTTGGGGTGG TAACAGAGTA CTTCCCACCA 3900 

CAGTGTTGAA AGGGAGAGCA AAGTCTTATG GATAAACCCT CCTTTCTTTT GGGGACACAT 3 960 

GGCTCTCACT TGAGAAGCTC ACCTGTGCTG AATGTCCACA TGGTCACTAA ACATGTTATC 4020 

CTTAAACCCC CCGTATGCCT GAGTTGAAAG GGCTCTCTCT TATTAGGTTT TCATGGGAAC 4080 
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ATGAGGCAGC AAATCTATTG CTAAGACTTT ACCAGGCTCA AAT CATCTGA GGCTGATAGA 4140 

TATTTGACTT GGTAAGACTT AAGTAAGGCT CTGGCTCCCA GGGGCATAAG CAACAGTTTC 4200 

TTGAATGTGC CATCTGAGAA GGGAGACCCA GGTTATGAGT TTTCCTTTGA ACACATTGGT 4260 

CTTTTCTCAA AGTTCCTGCC TTGCTAGACT GTTAGCTCTT TGAGGACAGG GACTATGTCT 4320 

TATCAATCAC TATTATTTTC CTGTTACCTA GCATGGGACA AGTACACAAC ACATATTTGT 4380 

TCAATGAATG AATGAATGTC TTCTAAAAGA CTCCTCTGAT TGGGAGACCA TATCTATAAT 4440 

TGGGATGTGA ATCATTTCTT CAGTGGAATA AGAGCACAAC GGCACAACCT TCAAGGACAT 4500 

ATTATCTACT ATGAACATTT TACTGTGAGA CTCTTTATTT TGCCTTCTAC TTGCGCTGAA 4560 

ATGAAACCAA AACAGGCCGT TGGGTTCCAC AAGTCAATAT ATGTTGGATG AGGATTCTGT 4620 

TGCCTTATTG GGAACTGTGA GACTTATCTG GTATGAGAAG CCAGTAATAA ACCTTTGACC 4680 

TGTTTTAACC AATGAAGATT ATGAATATGT TAATATGATG TAAATTGCTA TTTAAGTGTA 4740 

AAGCAGTTCT AAGTTTTAGT ATTTGGGGGA TTGGTTTTTA TTATTTTTTT CCTTTTTGAA 4800 

AAATACTGAG GGATCTTTTG ATAAAGTTAG TAATGCATGT TAGATTTTAG TTTTGCAAGC 4860 

ATGTTGTTTT TCAAATATAT CAAGTATAGA AAAAGGTAAA ACAGTTAAGA AGGAAGGCAA 4920 

TTATATTATT CTTCTGTAGT TAAGCAAACA CTTGTTGAGT GCCTGCTATG TGCACGGCAT 4980 

GGGCCCATAT GTGTGAGGAG CTTGTCTAAT TATGTAGGAA GCAATAGATC TCGGTAGTTA 5040 

CGTATTGGGC AGATACTTAC TGTATGAATG AAAGAACATC ACAGTAATCA CAATATCAGA 5100 

GCTGAATTAT CCTCAGTGTA GCTTCTTGGA ATTCAGTTTC TGGAACTAGA GATAGAGCAT 5160 

TTATTAAAAA AAACTCCTGT TGAGACTGTG TCTTATGAAC CTCTGAAACG TACAAGCCTT 5220 

CACAAGTTTA ACTAAATTGG GATTAATCTT TCTGTAGTTA TCTGCATAAT TCTTGTTTTT 5280 

CTTTCCATCT GGCTCCTGGG TTGACAATTT GTGGAAACAA CTCTATTGCT ACTATTTAAA 5340 

AAAAATCAGA AATCTTTCCC TTTAAGCTAT GTTAAATTCA AACTATTCCT GCTATTCCTG 5400 

TTTTGTCAAA GAATTATATT TTTCAAAATA TGTTTATTTG TTTGATGGGT CCCAGGAAAC 5460 

ACTAATAAAA ACCACAGAGA CCAGCCTGGA AAAAAAAAAA AAAAAAAAAA 5510 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5667 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ATCGCTCTCT GGTAGAGTTG ACGTGACACT CATTTCTGTT GTGGGTGGGG CCCTGGTTGG 60 

GAGGCATTGG CTCCACTGCA GCCTGGGTGT CTAGAGACCA CATTCTCACC CTGCCTTTGT 120 

TACTGGGAAA CCGAACGCGG CGCTGTGGCT TTCAGCTTGG GTAAGCCGGG TCTGCGGCGG 180 

GGATTGCCAT CTGAAGACAG AGGCAGGAGG GCAGCCACAC CTTGCCCAGC TGCACACCCA 240 

GTAACAAGTT TCCTCAGTGC GGGTATCTGC CACAGGCTGG GCTGGTCATC AAAGGGCCTC 300 

AGTCATATTT TAATAGAGCT CTTCAAGTAT CTGGCTTTGT GATAATATCA GGAATCAGTT 360 

GGTTTCTCTG ACAGACACTG CCCATTATCA TGATTCTGGA AGGAGGTGGT GTAATGAATC 420 

TCAACCCCGG CAACAACCTC CTTCACCAGC CGCCAGCCTG GACAGACAGC TACTCCACGT 480 

GCAATGTTTC CAGTGGGTTT TTTGGAGGCC AGTGGCATGA AATTCATCCT CAGTACTGGA 540 

CCAAGTACCA GGTGTGGGAG TGGCTCCAGC ACCTCCTGGA CACCAACCAG CTGGATGCCA 600 

ATTGTATCCC TTTCCAAGAG TTCGACATCA ACGGCGAGCA CCTCTGCAGC ATGAGTTTGC 660 

AGGAGTTCAC CCGGGCGGCA GGGACGGCGG GGCAGCTCCT CTACAGCAAC TTGCAGCATC 720 

TGAAGTGGAA CGGCCAGTGC AGTAGTGACC TGTTCCAGTC CACACACAAT GTCATTGTCA 780 

AGACTGAACA AACTGAGCCT TCCATCATGA ACACCTGGAA AGACGAGAAC TATTTATATG 840 

ACACCAACTA TGGTAGCACA GTAGATTTGT TGGACAGCAA AACTTTCTGC CGGGCTCAGA 900 

TCTCCATGAC AACCACCAGT CACCTTCCTG TTGCAGAGTC ACCTGATATG AAAAAGGAGC 960 

AAGACCCCCC TGCCAAGTGC CACACCAAAA AGCACAACCC GAGAGGGACT CACTTATGGG 1020 

AATTCATCCG CGACATCCTC TTGAACCCAG ACAAGAACCC AGGATTAATA AAATGGGAAG 1080 

ACCGATCTGA GGGCGTCTTC AGGTTCTTGA AATCAGAGGC AGTGGCTCAG CTATGGGGTA 1140 
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AAAAGAAGAA CAACAGCAGC ATGACCTATG AAAAGCTCAG CCGAGCTATG AGATATTACT 1200 

ACAAAAGAGA AATACTGGAG CGTGTGGATG GACGAAGACT GGTATATAAA TTTGGGAAGA 1260 

ATGCCCGAGG ATGGAGAGAA AATGAAAACT GAAGCTGCCA ATACTTTGGA CACAAACCAA 1320 

AACACACACC AAATAATCAG AAACAAAGAA CTCCTGGACG TAAATATTTC AAAGACTACT 1380 

TTTCTCTGAT ATTTATGTAC CATGAGGGGA AAAAGAAACT ACTTCTAACG GGAAGAAGAA 1440 

ACACTACAGT CGATTAAAAA AATTATTTTG TTACTTCGAA GTATGTCCTA TATGGGGAAA 1500 

AAACGTACAC AGTTTTCTGT GAAATATG AT GCTGTATGTG GTTGTGATTT TTTTTCACCT 1560 

CTATTGTGAA TTCTTTTTCA CTGCAAGAGT AACAGGATTT GTAGCCTTGT GCTTCTTGCT 1620 

AAGAGAAAGA AAAACAAAAT CAGAGGGCAT TAAATGTTTT GTATGTGACA TGATTTAGAA 1680 

AAAGGTGATG CATCCTCCTC ACATAAGCAT CCATATGGCT TCGTCAAGGG AGGTGAACAT 1740 

TGTTGCTGAG TTAAATTCCA GGGTCTCAGA TGGTTAGGAC AAAGTGGATG GATGCCGGGA 1800 

AGTTTAACCT GAGCCTTAGG ATCCAATGAG TGGAGAATGG GGACTTCCAA AACCCAAGGT 1860 

TGGCTATAAT CTCTGCATAA CCACATGACT TGGAATGCTT AAATCAGCAA GAAGAATAAT 1920 

GGTGGGGTCT TTATACTCAT TCAGGAATGG TTTATCTGAT GCCAGGGCTG TCTTCCTTTC 1980 

TCCCCTTTGG ATGGTTGGTG AAATACTTTA ATTGCCCTGT CTGCTCACTT CTAGCTATTT 2040 

AAGAGAGAAC CCAGCTTGGT TCTTTTTTGC TCCAAGTGCT TAAAAATAAG TTGGAAAAAG 2100 

GAGACGGTGG TGTGGAAATG GCTGAAGAGT TTGCTCTTGT ATCCCTATAG TCCAAGGTTT 2160 

CTCAATCTGC ACAATTGACA TTTTTGGCCG GAGTGTTCTT TGTGGTGAGG GCTTTCCTGT 2220 

GCATTGTAAG ATGTTCAGCA GTATCCACTC ATGGTCTCTA ACCACTTGAC ACCAGAAACC 2280 

CCCCAGCTGT GATAACGCAA AATGTCTCTA GACATCACCA AATGTTCCCT GGGGGTGGCA 2340 

AATTTGCCCT TGATTGAGAA CCACCAGTTT AGCTAGTCAA TATGAGGATG GTGGTTTATT 2400 

CTCAGAAGAA AAAGATATGT AAGGTCTTTT AGCTCCTTAG AGTGAAGCAA AAGCAAGACT 2460 

TCAACCTCAA CCTATCTTTA TGTTTTAAAT ATTAGGGACA ATAAGTTGAA ATAGCTAGAG 2520 

GAGCTTCTTT TCAGAACCCC AGATGAGAGC CAATGTCAGA TAAAGTAAGC ATAGCAATGT 2580 

AGCAGGAACT ACAATAGAAG ACATTTTCAC TGGAATTACA AAGCAGAATT AAAATTATAT 2640 

TGTAGAAGGA AACACCAAGA AAAGAATTTC CAGGGAAAAT CCTCTTTGCA GGTATTAATT 2700 

CTTATAATTT TTTGTCTTTT GGATTATCTG TTTACTGTCT CATCTGAACT GATCCCAGGT 2760 

GAACGGTTTA TTGCCTAGAT TTGTACTCAG AGGAATTTTT TTTGTTTTGT TTTGTCTTTT 2820 

AAGAAAGGAA AGAAAGGATG AAAAAAATAA ACAGAAAACT CAGCTCAGGC ACAATTGTCA 2880 

CCAAGGAGTT AAAAGCTTCT TCTTCAATAG AGGAATTGTT CTGGGGGTCC TGGAGACTTA 2940 

CCATTGAGCC ATGCAATCTG GGAAGCACAG GAATAAGTAG ACACTTTGAA AATGGATTTG 3000 

AATGTTCTCA TCCCTTTTGC AGCTTTTCTT TTTGGCTCTC TCATGTCCTT GGCTTGCTCC 3060 

TCTATTCTAC CTCTCTTTCT CCAGCAATAA TATGCAAATG AAGACATGTA TCCATAAGAA 3120 

GGAGTGCTCT TCATCAACTA ATAGAGCACC TACCACAGTG TCATACCTGG TAGAGGTGAG 3180 

CAATTCATAT TCAAAGGTTG CAAAGTGTTT GTAATATATT CATGAGGCTG GAAGTAAGAA 3240 

GAATTAAAAA TTTGTCCTAA TTACAATGAG AACCATTCTA GGTAGTGATC TTGGAGCACA 3300 

CATGAATAAC TTTCTGAAGG TGCAACCAAA TCCATTTTTA TTTCTGCCTG GCTTGGTCAC 3360 

CTCTGTAAAG GTTTAACTTA GTGTTGTCAA GTAACAGTTA CTGAAAGAGC TGAGAAAAAG 3420 

AACAATGAAC AGCAACGATC TTGACTGTGC AACTCAGACA TTCCTGCAGA AAAGACATAT 3480 

GTTGCTTTAC AAGAAGGCCA AAGAACTATG GGGCCTTCCC AGCATTTGAC TGTTCATTGC 3540 

ATAGAATGAA TTAAATATCC AGTTACTTGA ATGGGTATAA CGCATGAATA TTTGTGTGTC 3600 

TGTGTGTGTG TCTGAGTTGT GTGATTTTAT TAGGGGCATC TGCCAATTCT CTCACTGTGG 3660 

TTCCTTCTCT GACTTTGCCT GTTCATCATC TAAGGAGGCT AGATCCTTCG CTGACTTCAC 3720 

CATTCCTCAA ACCTGTAAGT TTCTCACTTC TTCCAAATTG GCTTTGGCTC TTTCTTCAAC 3780 

CTTTCCATTC AAGAGCAATC TTTGCTAAGG AGTAAGTGAA TGTGAAGAGT ACCAACTACA 3840 

ACAATTCTAC AGATAATTAG TGGATTGTGT TGTTTGTTGA GAGTGAAGGT TTCTTGGCAT 3900 

CTGGTGCCTG ATTAAGGCTT GAGTATTAAG TTCTCAGCAT ATCTCTCTAT TGTCTTGACT 3960 

TGAGTTTGCT GCATTTTCTA TGTGCTGTTC GTGACTTGGA GAACTTAAAG TAATCGAGCT 4020 

ATGCCAACTT GGGGTGGTAA CAGAGTACTT CCCACCACAG TGTTGAAAGG GAGAGCAAAG 4080 

TCTTATGGAT AAACCCTCCT TTCTTTTGGG GACACATGGC TCTCACTTGA GAAGCTCACC 4140 

TGTGCTGAAT GTCCACATGG TCACTAAACA TGTTATCCTT AAACCCCCCG TATGCCTGAG 4200 

TTGAAAGGGC TCTCTCTTAT TAGGTTTTCA TGGGAACATG AGGCAGCAAA TCTATTGCTA 4260 

AGACTTTACC AGGCTCAAAT CATCTGAGGC TGATAGATAT TTGACTTGGT AAGACTTAAG 4320 

TAAGGCTCTG GCTCCCAGGG GCATAAGCAA CAGTTTCTTG AATGTGCCAT CTGAGAAGGG 4380 

AGACCCAGGT TATGAGTTTT CCTTTGAACA CATTGGTCTT TTCTCAAAGT TCCTGCCTTG 4440 

CTAGACTGTT AGCTCTTTGA GGACAGGGAC TATGTCTTAT CAATCACTAT TATTTTCCTG 4500 
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TTACCTAGCA TGGGACAAGT ACACAACACA 
TAAAAGACTC CTCTGATTGG GAGACCATAT 
TGGAATAAGA GCACAACGGC ACAACCTTCA 
TGTGAGACTC TTTATTTTGC CTTCTACTTG 
GTTCCACAAG TCAATATATG TTGGATGAGG 
TTATCTGGTA TGAGAAGCCA GTAATAAACC 
AATATGTTAA TATGATGTAA ATTGCTATTT 
TGGGGGATTG GTTTTTATTA TTTTTTTCCT 
AAGTTAGTAA TGCATGTTAG ATTTTAGTTT 
GTATAGAAAA AGGTAAAACA GTTAAGAAGG 
GCAAACACTT GTTGAGTGCC TGCTATGTGC 
GTCTAATTAT GTAGGAAGCA ATAGATCTCG 
ATGAATGAAA GAACATCACA GTAATCACAA 
TCTTGGAATT CAGTTTCTGG AACTAGAGAT 
GACTGTGTCT TATGAACCTC TGAAACGTAC 
TAATCTTTCT GTAGTTATCT GCATAATTCT 
ACAATTTGTG GAAACAACTC TATTGCTACT 
AAGCTATGTT AAATTCAAAC TATTCCTGCT 
CAAAATATGT TTATTTGTTT GATGGGTCCC 
GCCTGGAAAA AAAAAAAAAA AAAAAAA 



PCT/US98/01260 

TATTTGTTCA ATGAATGAAT GAATGTCTTC 4560 

CTATAATTGG GATGTGAATC ATTTCTTCAG 4620 

AGGACATATT ATCTACTATG AACATTTTAC 4680 

CGCTGAAATG AAACCAAAAC AGGCCGTTGG 4740 

ATTCTGTTGC CTTATTGGGA ACTGTGAGAC 4800 

TTTGACCTGT TTTAACCAAT GAAGATTATG 4860 

AAGTGTAAAG CAGTTCTAAG TTTTAGTATT 4920 

TTTTGAAAAA TACTGAGGGA TCTTTTGATA 4980 

TGCAAGCATG TTGTTTTTCA AATATATCAA 5040 

AAGGCAATTA TATTATTCTT CTGTAGTTAA 5100 

ACGGCATGGG CCCATATGTG TGAGGAGCTT 5160 

GTAGTTACGT ATTGGGCAGA TACTTACTGT 5220 

TATCAGAGCT GAATTATCCT CAGTGTAGCT 5280 

AGAGCATTTA TTAAAAAAAA CTCCTGTTGA 5340 

AAGCCTTCAC AAGTTTAACT AAATTGGGAT 5400 

TGTTTTTCTT TCCATCTGGC TCCTGGGTTG 5460 

ATTTAAAAAA AATCAGAAAT CTTTCCCTTT 5520 

ATTCCTGTTT TGTCAAAGAA TTATATTTTT 5580 

AGGAAACACT AATAAAAACC ACAGAGACCA 5640 

5667 



(2) INFORMATION FOR SEQ ID NO : 5 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


lie Leu 


Glu Gly Gly Gly 


Val 


Met 


Asn 


Leu 


Asn 


Pro 


Gly Asn Asn 


1 




5 






10 










15 


Leu 


Leu His 


Gin Pro Pro Ala 


Trp 


Thr 


Asp 


Ser 


Tyr 


Ser 


Thr 


Cys Asn 






20 




25 










30 




Val 


Ser Ser 


Gly Phe Phe Gly 


Gly Gin 


Trp His 


Glu 


He 


His 


Pro Gin 




35 




40 










45 






Tyr 


Trp Thr 


Lys Tyr Gin Val 


Trp 


Glu 


Trp Leu 


Gin 


His 


Leu 


Leu Asp 




50 


55 










60 








Thr 


Asn Gin 


Leu Asp Ala Asn 


Cys 


He 


Pro 


Phe 


Gin 


Glu 


Phe 


Asp He 


65 




70 








75 








80 


Asn 


Gly Glu 


His Leu Cys Ser 


Met 


Ser 


Leu 


Gin 


Glu 


Phe 


Thr Arg Ala 






85 






90 










95 


Ala 


Gly Thr 


Ala Gly Gin Leu 


Leu 


Tyr 


Ser 


Asn 


Leu 


Gin 


His 


Leu Lys 






100 




105 










110 




Trp 


Asn Gly 


Gin Cys Ser Ser 


Asp 


Leu 


Phe 


Gin 


Ser 


Thr 


His 


Asn Val 




115 




120 










125 






lie 


Val Lys 


Thr Glu Gin Thr 


Glu 


Pro 


Ser 


He 


Met 


Asn 


Thr Trp Lys 




130 


135 










140 








Asp 


Glu Asn 


Tyr Leu Tyr Asp 


Thr 


Asn 


Tyr Gly 


Ser 


Thr 


Val 


Asp Leu 


145 




150 








155 








160 


Leu 


Asp Ser 


Lys Thr Phe Cys 


Arg Ala 


Gin 


He 


Ser 


Met 


Thr 


Thr Thr 






165 






170 










175 


Ser 


His Leu 


Pro Val Ala Glu 


Ser 


Pro 


Asp 


Met 


Lys 


Lys 


Glu 


Gin Asp 
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180 185 190* 

Pro Pro Ala Lys Cys His Thr Lys Lys His Asn Pro Arg Gly Thr His 

195 200 205- 

Leu Trp Glu Phe lie Arg Asp lie Leu Leu Asn Pro Asp Lys Asn Pro 

210 215 220 

Gly Leu He Lys Trp Glu Asp Arg Ser Glu Gly Val Phe Arg Phe Leu 
225 230 235 240 

Lys Ser Glu Ala Val Ala Gin Leu Trp Gly Lys Lys Lys Asn Asn Ser 

245 250 255 

Ser Met Thr Tyr Glu Lys Leu Ser Arg Ala Met Arg Tyr Tyr Tyr Lys 

260 265 270 

Arg Glu He Leu Glu Arg Val Asp Gly Arg Arg Leu Val Tyr Lys Phe 

275 280 285 

Gly Lys Asn Ala Arg Gly Trp Arg Glu Asn Glu Asn 
290 295 300 

(2) INFORMATION FOR SEQ ID NO:6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2428 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CAGGGGTGCC GGGTTGCTCA GGCCATGGGA GCCACACCTG TTATTGCTGC CTCTGATTTG 60 

TGTGACACTG AGAAGCCCAC AGGCCTGTCC CTCCAACTCG GTGGACCCTC TCTGTGTGCA 120 

TTTGGTGTGT GAGCCAGCTC TGAGAAGGGT TCAGAAGCCA CTGGAGGCAT CTGGGGACCT 180 

CAGCTTCCAT GCCATCTCTG CCTCACTCCC ACAGGGTAAT GTTGGACTCG GTGACACACA 240 

GCACCTTCCT GCCTAATGCA TCCTTCTGCG ATCCCCTGAT GTCGTGGACT GATCTGTTCA 300 

GCAATGAAGA GTACTACCCT GCCTTTGAGC ATCAGACAGC CTGTGACTCA TACTGGACAT 360 

CAGTCCACCC TGAATACTGG ACTAAGCGCC ATGTGTGGGA GTGGCTCCAG TTCTGCTGCG 420 

ACCAGTACAA GTTGGACACC AATTGCATCT CCTTCTGCAA CTTCAACATC AGTGGCCTGC 480 

AGCTGTGCAG CATGACACAG GAGGAGTTCG TCGAGGCAGC TGGCCTCTGC GGCGAGTACC 540 

TGTACTTCAT CCTCCAGAAC ATCCGCACAC AAGGTTACTC CTTTTTTAAT GACGCTGAAG 600 

AAAGCAAGGC CACCATCAAA GACTATGCTG ATTCCAACTG CTTGAAAACA AGTGGCATCA 660 

AAAGTCAAGA CTGTCACAGT CATAGTAGAA CAAGCCTCCA AAGTTCTCAT CTATGGGAAT 720 

TTGTACGAGA CCTGCTTCTA TCTCCTGAAG AAAACTGTGG CATTCTGGAA TGGGAAGATA 780 

GGGAACAAGG AATTTTTCGG GTGGTTAAAT CGGAAGCCCT GGCAAAGATG TGGGGACAAA 840 

GGAAGAAAAA TGACAGAATG ACGTATGAAA AGTTGAGCAG AGCCCTGAGA TACTACTATA 900 

AAACAGGAAT TTTGGAGCGG GTTGACCGAA GGTTAGTGTA CAAATTTGGA AAAAATGCAC 960 

ACGGGTGGCA GGAAGACAAG CTATGATCTG CTCCAGGCAT CAAGCTCATT TTATGGATTT 1020 

CTGTCTTTTA AAACAATCAG ATTGCAATAG ACATTCGAAA GGCTTCATTT TCTTCTCTTT 1080 

TTTTTTAACC TGCAAACATG CTGATAAAAT TTCTCCACAT CTCAGCTTAC ATTTGGATTC 1140 

AGAGTTGTTG TCTACGGAGG GTGAGAGCAG AAACTCTTAA GAAATCCTTT CTTCTCCCTA 1200 

AGGGGATGAG GGGATGATCT TTTGTGGTGT CTTGATCAAA CTTTATTTTC CTAGAGTTGT 1260 

GGAATGACAA CAGCCCATGC CATTGATGCT GATCAGAGAA AAACTATTCA ATTCTGCCAT 1320 

TAGAGACACA TCCAATGCTC CCATCCCAAA GGTTCAAAAG TTTTCAAATA ACTGTGGCAG 1380 

CTCACCAAAG GTGGGGGAAA GCATGATTAG TTTGCAGGTT ATGGTAGGAG AGGGTGAGAT 1440 

ATAAGACATA CATACTTTAG ATTTTAAATT ATTAAAGTCA AAAATCCATA GAAAAGTATC 1500 

CCTTTTTTTT TTTTTTGAGA CGGGTTCTCA CTATGTTGCC CAGGGCTGGT CTTGAACTCC 1560 

TATGCTCAAG TGATCCTCCC ACCTCGGCCT CCCAAAGTAC TGTGATTACA AGCGTGAGCC 1620 

ACGGCACCTG GGCAGAAAAG TATCTTAATT AATGAAAGAG CTAAGCCATC AAGCTGGGAC 1680 
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TTAATTGGAT TTAACATAGG TTCACAGAAA GTTTCCTAAC CAGAGCATCT TTTTGACCAC 1740 

TCAGCAAAAC TTCCACAGAC ATCCTTCTGG ACTTAAACAC TTAACATTAA CCACATTATT 1800 

AATTGTTGCT GAGTTTATTC CCCCTTCTAA CTGATGGCTG GCATCTGATA TGCAGAGTTA 1860 

GTCAACAGAC ACTGGCATCA ATTACAAAAT CACTGCTGTT TCTGTGATTC AAGCTGTCAA 1920 

CACAATAAAA TCGAAATTCA TTGATTCCAT CTCTGGTCCA GATGTTAAAC GTTTATAAAA 1980 

CCGGAAATGT CCTAACAACT CTGTAATGGC AAATTAAATT GTGTGTCTTT TTTGTTTTGT 2040 

CTTTCTACCT GATGTGTATT CAAGCGCTAT AACACGTATT TCCTTGACAA AAATAGTGAC 2100 

AGTGAATTCA CACTAATAAA TGTTCATAGG TTAAAGTCTG CACTGACATT TTCTCATCAA 2160 

TCACTGGTAT GTAAGTTATC AGTGACTGAC AGCTAGGTGG ACTGCCCCTA GGACTTCTGT 2220 

TTCACCAGAG CAGGAATCAA GTGGTGAGGC ACTGAATCGC TGTACAGGCT GAAGACCTCC 2280 

TTATTAGAGT TGAACTTCAA AGTAACTTGT TTTAAAAAAT GTGAATTACT GTAAAATAAT 2340 

CTATTTTGGA TTCATGTGTT TTCCAGGTGG ATATAGTTTG TAAACAATGT GAATAAAGTA 2400 

TTTAACATGT TCAAAAAAAA AAAAAAAA 2428 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 265 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



Met Pro Ser Leu Pro His Ser His Arg Val Met Leu Asp Ser Val Thr 

15 10 15 

His Ser Thr Phe Leu Pro Asn Ala Ser Phe Cys Asp Pro Leu Met Ser 

20 25 30 

Trp Thr Asp Leu Phe Ser Asn Glu Glu Tyr Tyr Pro Ala Phe Glu His 

35 40 45 

Gin Thr Ala Cys Asp Ser Tyr Trp Thr Ser Val His Pro Glu Tyr Trp 

50 55 60 

Thr Lys Arg His Val Trp Glu Trp Leu Gin Phe Cys Cys Asp Gin Tyr 
65 70 75 80 

Lys Leu Asp Thr Asn Cys lie Ser Phe Cys Asn Phe Asn lie Ser Gly 

85 90 95 

Leu Gin Leu Cys Ser Met Thr Gin Glu Glu Phe Val Glu Ala Ala Gly 

100 105 110 

Leu Cys Gly Glu Tyr Leu Tyr Phe lie Leu Gin Asn lie Arg Thr Gin 

115 120 125 

Gly Tyr Ser Phe Phe Asn Asp Ala Glu Glu Ser Lys Ala Thr lie Lys 

130 135 140 

Asp Tyr Ala Asp Ser Asn Cys Leu Lys Thr Ser Gly lie Lys Ser Gin 
145 150 155 160 

Asp Cys His Ser His Ser Arg Thr Ser Leu Gin Ser Ser His Leu Trp 

165 170 175 

Glu Phe Val Arg Asp Leu Leu Leu Ser Pro Glu Glu Asn Cys Gly lie 

180 185 190 

Leu Glu Trp Glu Asp Arg Glu Gin Gly lie Phe Arg Val Val Lys Ser 

195 200 205 

Glu Ala Leu Ala Lys Met Trp Gly Gin Arg Lys Lys Asn Asp Arg Met 

210 215 220 

Thr Tyr Glu Lys Leu Ser Arg Ala Leu Arg Tyr Tyr Tyr Lys Thr Gly 
225 230 235 240 
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lie Leu Glu Arg Val Asp Arg Arg Leu Val Tyr Lys Phe Gly - Lys Asn 

245 250 255 

Ala His Gly Trp Gin Glu Asp Lys Leu 

260 265 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2280 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

CTGGGAGCGC CTGCCTTCTC TTGCCTTGAA AGCCTCCTCT TTGGACCTAG CCACCGCTGC 60 

CCTCACGGTA ATGTTGGACT CGGTGACACA CAGCACCTTC CTGCCTAATG CATCCTTCTG 120 

CGATCCCCTG ATGTCGTGGA CTGATCTGTT CAGCAATGAA GAGTACTACC CTGCCTTTGA 180 

GCATCAGACA GCCTGTGACT CATACTGGAC ATCAGTCCAC CCTGAATACT GGACTAAGCG 240 

CCATGTGTGG GAGTGGCTCC AGTTCTGCTG CGACCAGTAC AAGTTGGACA CCAATTGCAT 3 00 

CTCCTTCTGC AACTTCAACA TCAGTGGCCT GCAGCTGTGC AGCATGACAC AGGAGGAGTT 360 

CGTCGAGGCA GCTGGCCTCT GCGGCGAGTA CCTGTACTTC ATCCTCCAGA ACATCCGCAC 420 

ACAAGGTTAC TCCTTTTTTA ATGACGCTGA AGAAAGCAAG GCCACCATCA AAGACTATGC 480 

TGATTCCAAC TGCTTGAAAA CAAGTGGCAT CAAAAGTCAA GACTGTCACA GTCATAGTAG 540 

AACAAGCCTC CAAAGTTCTC ATCTATGGGA ATTTGTACGA GACCTGCTTC TATCTCCTGA 600 

AGAAAACTGT GGCATTCTGG AATGGGAAGA TAGGGAACAA GGAATTTTTC GGGTGGTTAA 660 

ATCGGAAGCC CTGGCAAAGA TGTGGGGACA AAGGAAGAAA AATGACAGAA TGACGTATGA 720 

AAAGTTGAGC AGAGCCCTGA GATACTACTA TAAAACAGGA ATTTTGGAGC GGGTTGACCG 780 

AAGGTTAGTG TACAAATTTG GAAAAAATGC ACACGGGTGG CAGGAAGACA AGCTATGATC 840 

TGCTCCAGGC ATCAAGCTCA TTTTATGGAT TTCTGTCTTT TAAAACAATC AGATTGCAAT 900 

AGACATTCGA AAGGCTTCAT TTTCTTCTCT TTTTTTTTAA CCTGCAAACA TGCTGATAAA 960 

ATTTCTCCAC ATCTCAGCTT ACATTTGGAT TCAGAGTTGT TGTCTACGGA GGGTGAGAGC 1020 

AGAAACTCTT AAGAAATCCT TTCTTCTCCC TAAGGGGATG AGGGGATGAT CTTTTGTGGT 1080 

GTCTTGATCA AACTTTATTT TCCTAGAGTT GTGGAATGAC AACAGCCCAT GCCATTGATG 1140 

CTGATCAGAG AAAAACTATT CAATTCTGCC ATTAGAGACA CATCCAATGC TCCCATCCCA 1200 

AAGGTTCAAA AGTTTTCAAA TAACTGTGGC AGCTCACCAA AGGTGGGGGA AAGCATGATT 1260 

AGTTTGCAGG TTATGGTAGG AGAGGGTGAG ATATAAGACA TACATACTTT AGATTTTAAA 1320 

TTATTAAAGT CAAAAATCCA TAGAAAAGTA TCCCTTTTTT TTTTTTTTGA GACGGGTTCT 1380 

CACTATGTTG CCCAGGGCTG GTCTTGAACT CCTATGCTCA AGTGATCCTC CCACCTCGGC 1440 

CTCCCAAAGT ACTGTGATTA CAAGCGTGAG CCACGGCACC TGGGCAGAAA AGTATCTTAA 1500 

TTAATGAAAG AGCTAAGCCA TCAAGCTGGG ACTTAATTGG ATTTAACATA GGTTCACAGA 1560 

AAGTTTCCTA ACCAGAGCAT CTTTTTGACC ACTCAGCAAA ACTTCCACAG ACATCCTTCT 1620 

GGACTTAAAC ACTTAACATT AACCACATTA TTAATTGTTG CTGAGTTTAT TCCCCCTTCT 1680 

AACTGATGGC TGGCATCTGA TATGCAGAGT TAGTCAACAG ACACTGGCAT CAATTACAAA 1740 

ATCACTGCTG TTTCTGTGAT TCAAGCTGTC AACACAATAA AATCGAAATT CATTGATTCC 1800 

ATCTCTGGTC CAGATGTTAA ACGTTTATAA AACCGGAAAT GTCCTAACAA CTCTGTAATG 1860 

GCAAATTAAA TTGTGTGTCT TTTTTGTTTT GTCTTTCTAC CTGATGTGTA TTCAAGCGCT 1920 

ATAACACGTA TTTCCTTGAC AAAAATAGTG ACAGTGAATT CACACTAATA AATGTTCATA 1980 

GGTTAAAGTC TGCACTGACA TTTTCTCATC AATCACTGGT ATGTAAGTTA TCAGTGACTG 2040 

ACAGCTAGGT GGACTGCCCC TAGGACTTCT GTTTCACCAG AGCAGGAATC AAGTGGTGAG 2100 

GCACTGAATC GCTGTACAGG CTGAAGACCT CCTTATTAGA GTTGAACTTC AAAGTAACTT 2160 

GTTTTAAAAA ATGTGAATTA CTGTAAAATA ATCTATTTTG GATTCATGTG TTTTCCAGGT 2220 

GGATATAGTT TGTAAACAAT GTGAATAAAG TATTTAACAT GTTCAAAAAA AAAAAAAAAA 2280 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 55 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Leu 


Asp 


Ser 


Val 


Thr 


His 


Ser 


Thr 


Phe 


Leu 


Pro Asn Ala 


Ser 


Phe 


1 








5 










10 

mm> \r 






15 




Cvs 


Asp 


Pro 


Leu 


Met 


Ser 

few* 


Tro 


Thr 

m* m 


ASD 


Leu 


Phe 


Ser Asn Glu 


Glu 


Tvr 








?0 










25 






30 






1 Y J - 


Pro 


Ala 


Phe 


Glu 


His 


Gin 


Thr 


Ala 


Cvs 


Asp 


Ser Tyr Trp 


Thr 


Ser 






35 










40 








45 






Va 1 
V aX 


His 


Pro 


Glu 
\j * *a 




Tim 


Thr 

X AAA* 


Lvs 


Arcr 


His 


Val 


Trp Glu Trp 


Leu 


Gin 




50 










j ~> 










60 






IT Alt. 


Cys 


Cys 




Gin 




Lvs 


Leu 

J-J W- Vp* 


Asd 


Thr 


Asn 


Cys He Ser 


Phe 


Cys 












70 










75 






80 


Asn 

null 


Phe 


Asn 


lie 


Ser 

4b» ' 


Glv 


Leu 


Gin 


Leu 


Cvs 


Ser 


Met Thr Gin 


Glu 


Glu 










□ £ 










90 






95 




Phe 


Val 


Glu 


Ala 

mm m\ 


Ala 


Glv 


Leu 


Cvs 


Glv 


Glu 


Tyr 


Leu Tyr Phe 


He 


Leu 








100 

X \J \J 










105 






110 






Gin 


Asn 


He 




Thr 


Gin 


Glv 


Tvr 


Ser 


Phe 


Phe Asn Asp Ala 


Glu 


Glu 






115 






* 




120 








* 125 






Ser 


Lys 


Ala 


Thr 


He 


Lys 


Asp 


Tyr 


Ala 


Asp 


Ser 


Asn Cys Leu 


Lys 


Thr 




130 










135 










140 






Ser 


Gly He 


Lys 


Ser 


Gin 


Asp 


Cys 


His 


Ser 


His 


Ser Arg Thr 


Ser 


Leu 


145 










150 










155 






160 


Gin 


Ser 


Ser 


His 


Leu 


Trp 


Glu 


Phe 


Val 


Arg 


Asp 


Leu Leu Leu 


Ser 


Pro 










165 










170 






175 




Glu 


Glu 


Asn 


Cys 


Gly 


He 


Leu 


Glu 


Trp 


Glu 


Asp Arg Glu Gin 


Gly 


He 








180 










185 






190 






Phe 


Arg Val 


Val 


Lys 


Ser 


Glu 


Ala 


Leu 


Ala 


Lys 


Met Trp Gly 


Gin Arg 






195 










200 








205 






Lys 


Lys 


Asn 


Asp 


Arg 


Met 


Thr 


Tyr 


Glu 


Lys 


Leu 


Ser Arg Ala 


Leu Arg 




210 










215 










220 






Tyr 


Tyr Tyr 




Thr 


Gly 


He 


Leu 


Glu 


Arg 


Val 


Asp Arg Arg 


Leu 


Val 


225 










230 










235 






240 


Tyr 


Lys 


Phe 


Gly 


Lys 


Asn 


Ala 


His 


Gly 


Trp 


Gin Glu Asp Lys 


Leu 





245 250 255 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2498 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
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GAGGGGCTGA CAGCGGCGTC CCTCGTCTGG GCAGCCTCCG CTCTGCCACT CTCCTCCCGT 60 

CCTGAGGATG GGACCCCCGG AAAAGCGGCC TCTGGAGGCC TGCCATGGCA CCCAGAGCAG 120 

CCATTTTCCT CCCAGTTCTG GGGCTTTGGA AGGAGCTTGC GGATGAGGAG AGGGAGCCTC 180 

CGCAGGGCTC TGGCTCCCCT CCAGGGGCCG AGGCCGCACA CAAAGCCGCT CTGTGGCCCA 24 0 

ATTACACCTA CTGGATAGGA TTGTTGAGGG GACCTGAGAA ACTTGAGACG ACAAGAACGC 300 

GTAGCGCCTC GGCTGGCTGA GGGTGCTGAG CCCTCGTGTT GTGTTCTCTC CAGCTTTCCC 360 

CGTGCCTCAG CCACTCTTCA CGTTCCATCT GTGCTCTGTG CTGACCCGCC TGTGACTCAT 420 

ACTGGACATC AGTCCACCCT GAATACTGGA CTAAGCGCCA TGTGTGGGAG TGGCTCCAGT 480 

TCTGCTGCGA CCAGTACAAG TTGGACACCA ATTGCATCTC CTTCTGCAAC TTCAACATCA 540 

GTGGCCTGCA GCTGTGCAGC ATGACACAGG AGGAGTTCGT CGAGGCAGCT GGCCTCTGCG 600 

GCGAGTACCT GTACTTCATC CTCCAGAACA TCCGCACACA AGGTTACTCC TTTTTTAATG 660 

ACGCTGAAGA AAGCAAGGCC ACCATCAAAG ACTATGCTGA TTCCAACTGC TTGAAAACAA 720 

GTGGCATCAA AAGTCAAGAC TGTCACAGTC ATAGTAGAAC AAGCCTCCAA AGTTCTCATC 780 

TATGGGAATT TGTACGAGAC CTGCTTCTAT CTCCTGAAGA AAACTGTGGC ATTCTGGAAT 840 

GGGAAGATAG GGAACAAGGA ATTTTTCGGG TGGTTAAATC GGAAGCCCTG GCAAAGATGT 900 

GGGGACAAAG GAAGAAAAAT GACAGAATGA CGTATGAAAA GTTGAGCAGA GCCCTGAGAT 960 

ACTACTATAA AACAGGAATT TTGGAGCGGG TTGACCGAAG GTTAGTGTAC AAATTTGGAA 1020 

AAAATGCACA CGGGTGGCAG GAAGACAAGC TATGATCTGC TCCAGGCATC AAGCTCATTT 1080 

TATGGATTTC TGTCTTTTAA AACAATCAGA TTGCAATAGA CATTCGAAAG GCTTCATTTT 1140 

CTTCTCTTTT TTTTTAACCT GCAAACATGC TGATAAAATT TCTCCACATC TCAGCTTACA 1200 

TTTGGATTCA GAGTTGTTGT CTACGGAGGG TGAGAGCAGA AACTCTTAAG AAATCCTTTC 1260 

TTCTCCCTAA GGGGATGAGG GGATGATCTT TTGTGGTGTC TTGATCAAAC TTTATTTTCC 1320 

TAGAGTTGTG GAATGACAAC AGCCCATGCC ATTGATGCTG ATCAGAGAAA AACTATTCAA 1380 

TTCTGCCATT AGAGACACAT CCAATGCTCC CATCCCAAAG GTTCAAAAGT TTTCAAATAA 1440 

CTGTGGCAGC TCACCAAAGG TGGGGGAAAG CATGATTAGT TTGCAGGTTA TGGTAGGAGA 1500 

GGGTGAGATA TAAGACATAC ATACTTTAGA TTTTAAATTA TTAAAGTCAA AAATCCATAG 1560 

AAAAGTATCC CTTTTTTTTT TTTTTGAGAC GGGTTCTCAC TATGTTGCCC AGGGCTGGTC 1620 

TTGAACTCCT ATGCTCAAGT GATCCTCCCA CCTGGGCCTC CCAAAGTACT GTGATTACAA 1680 

GCGTGAGCCA CGGCACCTGG GCAGAAAAGT ATCTTAATTA ATGAAAGAGC TAAGCCATCA 1740 

AGCTGGGACT TAATTGGATT TAACATAGGT TCACAGAAAG TTTCCTAACC AGAGCATCTT 1800 

TTTGACCACT CAGCAAAACT TCCACAGACA TCCTTCTGGA CTTAAACACT TAACATTAAC 1860 

CACATTATTA ATTGTTGCTG AGTTTATTCC CCCTTCTAAC TGATGGCTGG CATCTGATAT 1920 

GCAGAGTTAG TCAACAGACA CTGGCATCAA TTACAAAATC ACTGCTGTTT CTGTGATTCA 1980 

AGCTGTCAAC ACAATAAAAT CGAAATTCAT TGATTCCATC TCTGGTCCCA GATGTTAAAC 2040 

GTTTATAAAA CCGGAAATGT CCTAACAACT CTGTAATGGC AAATTAAATT GTGTGTCTTT 2100 

TTTGTTTTGT CTTTCTACCT GATGTGTATT CAAGCGCTAT AACACGTATT TCCTTGACAA 2160 

AAATAGTGAC AGTGAATTCA CACTAATAAA TGTTCATAGG TTAAAGTCTG CACTGACATT 2220 

TTCTCATCAA TCACTGGTAT GTAAGTTATC AGTGACTGAC AGCTAGGTGG ACTGCCCCTA 2280 

GGACTTCTGT TTCACCAGAG CAGGAATCAA GTGGTGAGGC ACTGAATCGC TGTACAGGCT 2340 

GAAGACCTCC TTATTAGAGT TGAACTTCAA AGTAACTTGT TTTAAAAAAT GTGAATTACT 2400 

GTAAAATAAT CTATTTTGGA TTCATGTGTT TTCCAGGTGG ATATAGTTTG TAAACAATGT 2460 

GAATAAAGTA TTTAACATGT TCAAAAAAAA AAAAAAAA 2498 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 164 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
Met Thr Gin Glu Glu Phe Val Glu Ala Ala Gly Leu Cys Gly Glu Tyr 
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15 10 .15 

Leu Tyr Phe lie Leu Gin Asn lie Arg Thr Gin Gly Tyr Ser Phe Phe 

20 25 30 

Asn Asp Ala Glu Glu Ser Lys Ala Thr lie Lys Asp Tyr Ala Asp Ser 

35 40 45 

Asn Cys Leu Lys Thr Ser Gly lie Lys Ser Gin Asp Cys His Ser His 

50 55 60 

Ser Arg Thr Ser Leu Gin Ser Ser His Leu Trp Glu Phe Val Arg Asp 

65 70 75 80 

Leu Leu Leu Ser Pro Glu Glu Asn Cys Gly lie Leu Glu Trp Glu Asp 

85 90 95 

Arg Glu Gin Gly lie Phe Arg Val Val Lys Ser Glu Ala Leu Ala Lys 

100 105 110 

Met Trp Gly Gin Arg Lys Lys Asn Asp Arg Met Thr Tyr Glu Lys Leu 

115 120 125 

Ser Arg Ala Leu Arg Tyr Tyr Tyr Lys Thr Gly lie Leu Glu Arg Val 

130 135 140 

Asp Arg Arg Leu Val Tyr Lys Phe Gly Lys Asn Ala His Gly Trp Gin 
145 150 155 160 

Glu Asp Lys Leu 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AAATGAGCCA ATGTTTGTAA T 21 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
AAATGAGCCA GTGTTTGTAA T 21 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 736 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

AGGAAGTGAA GAACCTAGAT AATCCACCAA CCGGATAATC AGCTCTTGCA TATTTGAGAG 60 

TTGACTGCTT GACCTAAGCA TCTCCTCATA AGGTACCCTC CCTCCCAGGA CCTTCCCTTT 120 

CAAACCTCTC AAGGCTCTTA CCTGGGGCCA GGGGAGATAG GCTTTTCAAA GTCCATTGAA 180 

TTGCCAAGAG TCTCTGTCAA GAAGGCAGTC ATGGTGCCTG GAGAGGGAAC TTGCTGGGAG 240 

CCCCTTCAGA GCCTGGTACT TATAGAGCTA GGGAAAAGAT CTTGATGCCA AAGCAGGGTG 300 

GACTAAATAC AGACTAATAA ATGAGACAGG TGCTCAAGAG GGCCCCTCCA TACCATCATC 360 

TCCTCCAGAT TTGGACTTCT ACTCACTTTG CTTTTACATT CCCTCTTCCC GATGGTGTCT 420 

TTGGTGAGCA GGGTGCTTTT CACCTGAAAC AGCCTCTGAG CTGAAAAGAA CAGTCACCAC 480 

CAAATCAATT CCTCATCCAT TAACAGGTTG TCTCTCTGTT CTTGAGACAC AGGCATTACC 540 

TGGTTAGACC TGTTTTGTTT GAACACTAAC GTGTGAGTTG GCCAAATGCA AATGAGCCAA 600 

TGTTTGTAAT CCTTTATTTT ATTTTTTTAA AGGGCTGGGT AGCCAATCAG AAGAGGGGGA 660 

AGTGACTTAG GGAATTCCCG GTTGGTGGCT TATTGCTTAA CATCCTACAA AATGATTTAA 720 

AATTATTGTT ATATGC 736 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GCCAGAGTCC TCCTTGAGAA CTTACAATGT GTCCATATTA AGGATCTGCT GTGTTTGATG 60 

ATTTTGTGAT TACACTTTAA ACTTCTTATC CATAAAGGAC ATACTTGATA TATCTGAGAC 120 

TTGTAGTAGA AGGCCTTGAG ACATCCATCT CATCCCATCA TTATCTATCT ATCATCTATC 180 

TATCTATCTA TCTATCTATC TATCTATCTA TCTATCATCT ATCTATCTAT CGCCAGTACT 240 

GTCTTGTTGA AGTTGGCAGT. AGGGTGAAAG ACCTCAAACT CCAAAGGACT TTCCGTATGG 300 

ATGCAATATA CCTGCAATTC TAGCTTTTCT GTG 333 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ACAGAATGAC RTATGAAAAG T 21 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
GTAACCAAGC KCAAGCCACC C 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
AAGGAGCCCA YCTGAGTGCA G 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
CGTTCCATCT STGCTCTGTG C 

(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
AGCGCCTCGG YTGGCTGAGG G 

(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
TGTATTCAAG YGCTATAACA C 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
CACTGAGAAG CCNACAGGCC TGT 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
CCCACAGGCC WGTCCCTCCA A 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CGTCCATCTC YAGCTCCAGG G 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
GACTTGATAA YGCCCGTGGT G 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
ACTTGATAAC RCCCGTGGTG C 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
CTCCCCTCCA WGAGCCACAG C 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
ATTTCCTGCA TNGTCTGGAC TT 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 

ATCCAAACAC YTGAGTGGAA A 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
AGTTTCCTCA RTGCGGGAGC T 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
GCGAGCACCT YTGCAGCATG A 

(2). INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TTCACCCGGG YGGCAGGGAC G. 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CTGGGGAAAA NNGATCGCTG AC 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GTCAATTAAA YGGCTCTCAT T 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
TAGATCATTC RTAACCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
AAAGAGAAAT WCTGGAGCGT G 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
ATGAGGGGAA MAAGAAACTA C 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
TTTTGTATGT KACATGATTT A 
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(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
AGCTTGGTTC YTTTTTGCTC C 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40 
TTGACACCAG RAACCCCCCA G 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
AAATGAGCCA RTGTTTGTAA T 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

ATCCATTTTG YATTCCTCAT T 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
CTGGAGCTCA RACCAGACAG C 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44 
GCCAGTGCAG SCATCATTAC C 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 
AGTTCAAATC RTAATTTTTA T 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
TCATCAGAAT YTAAATCTCC C 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



-117- 



WO 99/37809 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47 
GGAGATTCAG NTGAAGCAAG A 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 
TTTTTCCACA YCCAGCCTGG C 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49 
CCCAGCCTGG YGAACCCTGG C 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
CTCTTCATCA YGGTCAAATA C 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 
CAACTTGCTG YCAAAGTGCT G 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52 
TACTATGTGC YAGATACTAA G 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE. DESCRIPTION: SEQ ID NO:53 
ATGCCACTTT RRGACAACTT GAG 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 
CGCATGCCTG KAAAGAAGAG A 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 

GGATAAGCAC MAGTGAGCCT G 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
AAAGCCAGAC RGCAACTTGT G 

(2) INFORMATION FOR SEQ ID NO:57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57 
TCTCAAAAAG RGTGATAGGA G 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
TCTGAATCCT STCTCCTCCT T 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59 
TAGAACCAGG WTGTGGGACC A 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60 
TTCTTGTGTC RGGCGCAAAA C 

(2) INFORMATION FOR SEQ ID NO:61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
AACCAACATG RAGAAACCCC A 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62 
AATAAACTAT RGTTCACCTA G 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63 
ACATATTTGT RTCTCATATG A 

(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64 
CAAAGCAGTT YCTAATAATC C 
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(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65 
AGATCCTAAC YGGGGCCTCC T 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66 
CTCTTTCTCT YTGCTTCCTC C 

(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67 
TTAGGAATCC WCAAATATGT A 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68 

GTCTGACTCC RCCTCCCTCA T 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69 
GAATCACATC RTGAGAAATG T 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70 
AATTCAATCC YTCACAGACT T 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71 
GTGTAGCCAG RGTTGCTAAT T 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72 
CCTAGAAATA SCCAAGGGCA C 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73 
AAATTCTCAT RCCTCACCCT C 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74 
TCCCACCCCT RTCACCTTCA T 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75 
CCTCATTCTC RGAAGCCAAC A 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76 
GAAGAGCCGT YCAGTCCCTT T 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77 
TCCATAGGCT YTTTATTTGG C 
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(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78 
TCGTTTAGTA YACAGGCTTT G 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79 
GCCTCAGTTG YCCCAGCTAT A 

(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80 
AGCAAAATGC WCTATGCACT G 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81 

GTGTCCTGAC NNNNNNNNNN NACACTGCCT G 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82 
ATCAGATAAC RCCTACACTT A 

(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83 

TCTCTCTTCT SCCTGCCCTG T 

(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84 
TGGACACAGG KAGGGGAATA T 

(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85 
TGTCACTTGC RCATACAAGG C 

(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86 
ATCATCAGAT YAGCCCAGAA T 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87 
TCAACAGAGA RAGTTAATGG T 

(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88 
AGCAATAATG YTTCCCTTTT C 

(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89 
TCTAGCTTTT YTGTGTTTTT T 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90 
GATTCCTTAA YGCTTGATAC T 
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(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91 
CCTCCTCCAG YACCAAAGTG G 

(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92 
ATGGCCACAG RTCAAATCCT G 

(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93 
ACTGAGTGTT YATGCCAATT T 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94 

GACAAGCCCT RTCTGACACA C 

(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95 
TGAAAAGCCT YCTTGCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96 
TCCTGGAGTT YCTTTGCTCC C 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97 
GATTCCAAAT WAACTAAAGA T 

(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98 
GACCTCAAGT CRTCCACCCG CC 

(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
AACAAATACT MCCCCGCAAC CC 

(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100 
ATTTTTTTTT NAAGGAAAAT A 

(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101 
AAATTTCCCC MAAACAAGCA G 

(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102 
GAGAAAGGGT RTGTGTGTGT G 

(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103 
GTGTGTGTGT NNNNGTATGT GCGCGTG 
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(2) INFORMATION FOR SEQ ID NO:104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104 
ATCGGGAACC YCATACCCCA A 

(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105 
TTTGTTTCGC MATGAGGTAC G 

(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106 
TGAGGGTGTT STGGGCTGGA C 

(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:107 

TCTTCATTGG YATCTGAATG T 

(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108 
GCGAGCACCT YTGCAGCATG A 

(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:109 
AACCCCCCCC MCACACACAC A 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110 
TCAGTGCTCT STAATCAGTC A. 

(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111 
TCTTTGTGAA ANNAATTAGT CTG 

(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112 
GCTGCCCTGA SAGCTGGGCC A 

(2) INFORMATION FOR SEQ ID NO:113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113 
CCTTCTGATC YTTGTTTGCT G 

(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114 
GGAACACTGA KTCTTGATTA G 

(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 115 
TAGGCTTCTC YTGATAATTG A 

(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116 
TCTTAAAATA MTTGGCTTGT A 
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(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117 
TAGATCATTA RTAACCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:118 
ATGAGGGGAA MAAGAAACTA C 

(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119 
TTGACACCAG RAACCCCCCA G 

(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120 

TGTTTTAAAT RTTAGGGACA A 

(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121 
GTAAGCATAG YAATGTAGCA G 

(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122 
GGCTCTTTCT KCAACCTTTC C 

(2) INFORMATION FOR SEQ ID NO:123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:123 
GACCCAGGTT RTGAGTTTTC C 

(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124 
GACAGAATGA YATATGAAAA G 

(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125 
TGTGTGACAC YGAGAAGCCC A 

(2) INFORMATION FOR SEQ ID NO:126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126 
AGTACTGGAC MAAGTACCAG G 

(2) INFORMATION FOR SEQ ID NO:127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127 
CCTGGGAGCA RGTATTGCAT T 

(2) INFORMATION FOR SEQ ID NO:128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

Jxi) SEQUENCE DESCRIPTION: SEQ ID NO: 128 
AGATTTGAGG YCTCAGGTCC C 

(2) INFORMATION FOR SEQ ID NO: 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129 
TGTCAATGTC RCATGATAAG C 

(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130 
TTGCCCCAGT KTTCTCCGGG C 

(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131 
TATGAGCAGC RTAGGGAGTG G 

(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132 
AGTTGACTGA AAAANTAAAT AAGAC 

(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133 
ATTCAAATAG SCTCTAGAAA C 

(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134 
CCCAGAATTT MATATCCATT C 

(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135 
TGACCCAACA RAAACTCACT G 

(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136 
CCAGAATATA WCATCAGCCC T 

(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137 
CATCAGCCCT WCTGAGGAGA T 

(2) INFORMATION FOR SEQ ID NO: 13 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138 
CCAGAACAGA YTTTATTCTG T 

(2) INFORMATION FOR SEQ ID NO: 13 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139 
TTCAGCCATC YTTCCAGTTG T 

(2) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140 
TCACTAACTC WAAAACGACA T 

(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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<ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141 
AACTCAAAAA YGACATCCTC C 

(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142 
GAACTGCACA RGTTGCACAC T 

(2) INFORMATION FOR SEQ ID NO: 143: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143 
TTGTTCCATG SACTACCTCC T 

(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144 
ACAGCAGGCA YTCAACAAAT T 

(2) INFORMATION FOR SEQ ID NO: 14 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145 
TTATTTTTGG STTTGTTTTA A 

(2) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146 
TAGGCTGTTC YCTGCCATCA C 

(2) INFORMATION FOR SEQ ID NO: 147: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147 
GTGCTCTGGG MCACACAGCT C 

(2) INFORMATION FOR SEQ ID NO: 148: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148 
AGACCCGATA RGAGCTCCTT C 

(2) INFORMATION FOR SEQ ID NO: 149: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149 
CATCTTGCGC RGTCATGTAA G 

(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150 
CAGCACAGCT RTTCCCTCAA A 

(2) INFORMATION FOR SEQ ID NO: 151: 

( i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151 
TTTGGAAACA YGGTGAAGTA T 

(2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152 
ACACGGTGAA RTATTGTCTC C 

(2) INFORMATION FOR SEQ ID NO: 153: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153 
AAAAGTGGAT MCTCTGCAAA C 

(2) INFORMATION FOR SEQ ID NO: 154 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154 
CTTCAAATGC RGCTATTAAA G 

(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155 
CCTGGGAGCA YGGTAAATCA G 

(2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156 
TGAAAATGTC RCTTTCTCAC CT 

(2) INFORMATION FOR SEQ ID NO: 157: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157 
CCTGATATTT RCCAACAAGA A 

(2) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158 
AAAGGGTTAG YTTGTCCCCT T 

(2) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159 
TGAAAATAAA ASACAATTTT TT 

(2) INFORMATION FOR SEQ ID NO: 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160 
CTGCTGTGGA CGAATAGG 

(2) INFORMATION FOR SEQ ID NO: 161 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161 
TCAATATAAT CTTGCTTAAC TTGG 

(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162 
GACCTGTTTG GGTTGATTTC AG 

(2) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163 
GTTTCTTACA GTGTCTTGCT ATCACATCAC C 

(2) INFORMATION FOR SEQ ID NO: 164 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:164 
GAGGACTGGC AGTACCAAGT AAAC 

(2) INFORMATION FOR SEQ ID NO: 165: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 5 
GTTTCTTTGG TTCATTCTAA GATGGCTGG 
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(2) INFORMATION FOR SEQ ID NO:166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166 
GCTGAGGCAG GAGAAAAGAC AAG 

(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 167 
GTTTCTTCAT GCAAAGGTCA GGAGGTAGG 

(2) INFORMATION FOR SEQ ID NO: 168: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168 
GTTGCTTCCA GACGAGGTAC ATG 

(2) INFORMATION FOR SEQ ID NO: 16 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169 

GTTTCTTCAA TGGCTCCACA AACATCTCTG 

(2) INFORMATION FOR SEQ ID NO: 170: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170 
AGGTTTAGGG GACAGGGTTT GG 

(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171 
GTTTCTTTCC TGGCTAACAC GGTGAAATC 

(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172 
GTTTCTTATT GCCTCCTCCC AAAATTC 

(2) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 173 
AGAGGCCACT GGAAGACGAA 

(2) INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174 
AACTGGAGTC AGGCAAAACG TG 

(2) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175 
GTTTCTTTGG CTGGTAAGGA AAGAAACCAC 

(2) INFORMATION FOR SEQ ID NO: 176: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176 
GGCTAGGTTC ATAAACTCTG TGCTG 

(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177 
GTTTCTTGAT TGTTTGAGAT CCTTGACCCA G 

(2) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178 
GCCGAAATCA CAACACTGCA TC 
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(2) INFORMATION FOR SEQ ID NO: 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179 
GTTTCTTGAT TCTGCTCTTA CTCTTGCCCC 

(2) INFORMATION FOR SEQ ID NO: 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:180 
GTAATAGAAC CAAAGGGCTG AGAC 

(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181 
GTTTCTTCGG AGTCAGACCT TACATTGTTG AG 

(2) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182 

ATCTCCCTGC TACCCACCTT 

(2) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183 
GTTTCTTGTT TTCAGTGAGT TTCTGTTGGG 

(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184 
GTGTGCCAAA CAACATTTGC 

(2) INFORMATION FOR SEQ ID NO: 185: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185 
GTTTCTTCAA GCCATCAAGC TAGAGTGG 

(2) INFORMATION FOR SEQ ID NO: 186: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186 
GGGCTTTTAA ACCCTTATTT AACC 

(2) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187 
GTTTCTTAGG TGATCTCAGA GCCACTCA 

(2) INFORMATION FOR SEQ ID NO:188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188 
AGGGCAGGTG GGAACTTACT 

(2) INFORMATION FOR SEQ ID NO: 189: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189 
GTTTCTTTGG AGTCAGTTGA GCTTTCTACC 

(2) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190 
TGAACTTGCC TACCTCCCAG 

(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191 
GTTTCTTAGC ATATATCCTT ACACAAGCAC A 
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(2) INFORMATION FOR SEQ ID NO: 192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192 
CATGGTTCCA AAGGCAAGTT 

(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 193 
GTTTCTTTTG AGGCTGAATG AGCTGTG 

(2) INFORMATION FOR SEQ ID NO: 194: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194 
ACAGGTGGGA AGACTGAATG TC 

(2) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195 

GTTTCTTGCA GTACACATCA CATGACCTTG 

(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196 
GAAATAGGCG GAAACTGGTT C 

(2) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197 
GTTTCTTCGT TGTGGTTGTT CAGAAAGG 

(2) INFORMATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198 
GGTCAAGTGT TCAGAACGCA TC 

(2) INFORMATION FOR SEQ ID NO: 199: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199 
GTTTCTTGCA GGGATTATGC TAGGTCTGTA G 

(2) INFORMATION FOR SEQ ID NO: 200: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200 
AGCACTTCTG AGGAAGGGAC AC 

(2) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201 
GTTTCTTAGG GCAGGCAGAC ATACAAAC 

(2) INFORMATION FOR SEQ ID NO: 2 02: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202 
GCCAATGTGT TCCTAGAGCG AC 

(2) INFORMATION FOR SEQ ID NO: 2 03: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:203 
GTTTCTTTTA AAGGGGGTAG GGTGTCACC 

(2) INFORMATION FOR SEQ ID NO: 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 04 
GGAAGGGAAA AGGACAAGGT TTTG 
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(2) INFORMATION FOR SEQ ID NO: 2 05: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205 
GTTTCTTAGC AAGAGCACTG GTGTAGGAGT C 

(2) INFORMATION FOR SEQ ID NO: 2 06: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206 
GCTTTTCAAG CACTTGTCTC 

(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207 
TGGGATTGTG ACTTACCATG 

(2) INFORMATION FOR SEQ ID NO:208: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208 

ACTTGGTGTC TTATAGAAAG GTG 

(2) INFORMATION FOR SEQ ID NO: 209: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209 
GTTTCTTAGC TGTGTTTGCT GCATC 

(2) INFORMATION FOR SEQ ID NO: 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210 
AGATGTGTGA TGAGATGCAG 

(2) INFORMATION FOR SEQ ID NO: 211: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211 
GTTTCTTCAA ATAGTGCAAC AAACCC 

(2) INFORMATION FOR SEQ ID NO: 212: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212 
TGTCATTCTG AAAGTGCTTC C 

(2) INFORMATION FOR SEQ ID NO:213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO:213 

GTTTCTTCTG TAACTAACGA TCTGTAGTGG TG 

(2) INFORMATION FOR SEQ ID NO: 2 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214 
TATCAAGGTA ATATAGTAGC CACGG 

(2) INFORMATION FOR SEQ ID NO: 2 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215 
AGGTCTTTCA TGCAGAGTGG 

(2) INFORMATION FOR SEQ ID NO: 2 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:216 
ATTGCCAAAA CTTGGAAGC 

(2) INFORMATION FOR SEQ ID NO: 217: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217 
AGGTGACATA TCAAGACCCT G 
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(2) INFORMATION FOR SEQ ID NO: 218: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218 
TTGTCAACGA AGCCCAC 

(2) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219 
GTTTCTTGCA AGATTGTGTG TATGGATG 

(2) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220 
GCTCTCTATG TGTTTGGGTG 

(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221 

AAGAGTACGC TAGTGGATGG 

(2) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222 
TCCATTAGAC CCAGAAAGG 

(2) INFORMATION FOR SEQ ID NO: 223: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:223 
GTTTCTTCAC CAGGCTGAGA TGTTACT 

(2) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224 
AATCGTTCCT TATCAGGTAA TTTGG 

(2) INFORMATION FOR SEQ ID NO: 22 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225 
GTTTCTTCAA AGAAAGCAAT TCCATCATAA CA 

(2) INFORMATION FOR SEQ ID NO:226: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226 
GCATTTGTTG AAGCAAGCGG 

(2) INFORMATION FOR SEQ ID NO: 227: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227 
CTTTGTTCCT TGGCTGATGG 

(2) INFORMATION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQIDNO:228 
AATAGTACCA GACACACGTG 

(2) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 9 
CAATGGTTCA CAGCCCTTTT 

(2) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230 
AGCCTGGGAG ACAGAGTGAG 
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(2) INFORMATION FOR SEQ ID NO: 231: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 231 
GTTTCTTGCA CTTTTTGGGG AAGGTG 

(2) INFORMATION FOR SEQ ID NO: 23 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232 
GTTCCTCCCT TCCCTCTCC 

(2) INFORMATION FOR SEQ ID NO:233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233 
GTTTCTTTCA GGGACTGGAT TGTAG 

(2) INFORMATION FOR SEQ ID NO:234: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234 

GTGTTCTTTA TGTGTAGTTC 

(2) INFORMATION FOR SEQ ID NO: 235: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235 
GTTTCTTGGC AACAGAGTGA GACTCA 

(2) INFORMATION FOR SEQ ID NO: 236: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236 
GTGACATCCA GTGTTGGGAG 

(2) INFORMATION FOR SEQ ID NO: 23 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237 
GTTTCTTCCT AAGCAAGCAA GCAATCA 

(2) INFORMATION FOR SEQ ID NO: 23 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238 
AAAGGCAATT GGTGGACA 

(2) INFORMATION FOR SEQ ID NO: 23 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:239 
GTTTCTTTTC AATCCTTGAT GCAAAGT 

(2) INFORMATION FOR SEQ ID NO: 240: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240 
GGTGACAGAG CAAGATTTCG 

(2) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:241 
GTTTCTTGTA GAGTTGAGGG AGCAGC 

(2) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242 
CATCCATCTC ATCCCATCAT 

(2) INFORMATION FOR SEQ ID NO: 243: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 243 
GTTTCTTTTC ACCCTACTGC CAACTTC 
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(2) INFORMATION FOR SEQ ID NO: 244: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244 
CCGCCATTTT AGAGAGCATA 

(2) INFORMATION FOR SEQ ID NO: 24 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 5 
GTTTCTTTTC TGGGACAATT GGTAGGA 

(2) INFORMATION FOR SEQ ID NO: 246: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246 
TTTGTGTTAT TATTTCAGGT GC 

(2) INFORMATION FOR SEQ ID NO: 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247 

GTTTCTTGTT TTTTGTTTCA GTTTAGGAAC 

(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248 
CATACCCAAA TCGTTCTCTT CCTC 

(2) INFORMATION FOR SEQ ID NO: 249: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 9 
GTTTCTTGGA AAAGCAAAGG CATCGTAGAG 

(2) INFORMATION FOR SEQ ID NO: 250: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 250 
TACTAACCAA AAGAGTTGGG G 

(2) INFORMATION FOR SEQ ID NO: 251: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:251 
CTATCATTCA GAAAATGTTG GC 

(2) INFORMATION FOR SEQ ID NO: 2 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252 
GTATGGCAGT AGAGGGCATG 

(2) INFORMATION FOR SEQ ID NO:253: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253 
AAGGTTACAT TTCAAGAAAT AAAGT 

(2) INFORMATION FOR SEQ ID NO: 2 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254 
CTGTTCAGGC CTCAATATAT ACC 

(2) INFORMATION FOR SEQ ID NO: 2 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255 
AAGAGGATAG GTGGGGTTTG 

(2) INFORMATION FOR SEQ ID NO: 2 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:256 
CCTCCCACCT AGACACAAT 
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(2) INFORMATION FOR SEQ ID NO: 257: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 57 
ATATGATCTT TGCATCCCTG 

(2) INFORMATION FOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258 
AAGAAAGACC TGGAAGGAAT 

(2) INFORMATION FOR SEQ ID NO : 259 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259 
AAACAGCAAA ACCTCATCTC 

(2) INFORMATION FOR SEQ ID NO: 260: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260 

CCACCACTTA TTACCTGCAT 

(2) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261 
TGAATGAATG AATGAACGAA 

(2) INFORMATION FOR SEQ ID NO: 262: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262 
AACTGTGATT GTGCCACTGC ACTC 

(2) INFORMATION FOR SEQ ID NO: 263: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 263 
GTTTCTTCAC CGCCTTTATC CCTCAAATG 

(2) INFORMATION FOR SEQ ID NO: 264: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264 
GATGGGTGGA GGGCAGTTAA AG 

(2) INFORMATION FOR SEQ ID NO : 265: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:265 
GTCAAGCAAC TTGTCCAAGG CTAC 

(2) INFORMATION FOR SEQ ID NO: 2 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:266 
CAGGCTATCA GTTTCCTTTG GAG 

(2) INFORMATION FOR SEQ ID NO: 26 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:267 
GGCAGGTAAT ACTGGAGAAT TAGG 

(2) INFORMATION FOR SEQ ID NO:268: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268 
GACGGATCTC AGAGCCACTC 

(2) INFORMATION FOR SEQ ID NO: 269 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269 
GTTTCTTAAA AGATAAGGGC TTTTAAACC 
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(2) INFORMATION FOR SEQ ID NO:270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270 
AGTTTCACAG CTTGTTATGG 

(2) INFORMATION FOR SEQ ID NO: 271: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271 
GGTTGATGAA GTGAGACTTT 

(2) INFORMATION FOR SEQ ID NO: 272 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 272 
ATGGTGGATG CATCCTGTG 

(2) INFORMATION FOR SEQ ID NO: 273: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273 

GTTTCTTGTA TTGACTCCTC CTCTGC 

(2) INFORMATION FOR SEQ ID NO: 274: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274 
CAGTAAACAT 

(2) INFORMATION FOR SEQ ID NO: 2 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:275 
TGTTGAGTGG 

(2) INFORMATION FOR SEQ ID NO: 276: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:276 
TCTCCTCAAT GTGCATGT 

(2) INFORMATION FOR SEQ ID NO: 2 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 277 

ATTCTACATA 

(2) INFORMATION FOR SEQ ID NO: 278: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 278 
GTGTTTGCAT 

(2) INFORMATION FOR SEQ ID NO: 279: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 279 
ACAAGTTGGC 

(2) INFORMATION FOR SEQ ID NO : 280: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:280 
TAGTACCAGA 

(2) INFORMATION FOR SEQ ID NO: 281: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281 
TACATCCAAG AAAA 

(2) INFORMATION FOR SEQ ID NO: 282: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:282 
GAGACTCTGA CAAATATATA TA 

(2) INFORMATION FOR SEQ ID NO:283: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283 
TGTTGATCGC CAAACCAAAA TC 

(2) INFORMATION FOR SEQ ID NO: 284: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284 
AATGCATGTA TGTATATGGT GTGGTATGTG TACATATG 

(2) INFORMATION FOR SEQ ID NO: 285: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:285 
CCTCCCAGAA CAATCATGAT AA 

(2) INFORMATION FOR SEQ ID NO: 286: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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PCT/US98/01260 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:286: 



AGACAGTCTC AAAAAATATT TTAAAGAAAA AGCTGGATAA ATAACTAGCT TTAAGAAAAT 
AAGAAGAAAA AGAAAGAAGA AAGTAA 



60 
86 



(2) INFORMATION FOR SEQ ID NO: 287: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:287: 

AACTAGCTTT AAGAAAATAA GAAGAAAAAG AAAGAAGAAA GTAAGAAAGA GAAAGAAAAG 60 
AAAGAAAAGA AAGAGGAATG ATTGAC 86 

(2) INFORMATION FOR SEQ ID NO: 288: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:288: 
CGCGCACATA CACCCTTTCT CT 22 
(2) INFORMATION FOR SEQ ID NO: 289: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:289: 
CAGTAAACAT CATGTTGAGT GG 22 

(2) INFORMATION FOR SEQ ID NO: 290: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 290: 
TCTCCTCAAT GTGCATGTGT GCATGAGTGC ACATTCTACA TA 
(2) INFORMATION FOR SEQ ID NO: 291: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:291: 
GTGTTTGCAT GTTGTACAAG TTGGC 

(2) INFORMATION FOR SEQ ID NO: 2 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: 
TAGTACCAGA CACGTGCAGG CAAGCGCACC ATACATCCAA GAAAA 
(2) INFORMATION FOR SEQ ID NO: 2 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:293: 
GGAGGCTGAG CAGGGGTGCC 

(2) INFORMATION FOR SEQ ID NO: 2 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 294: 
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ACTCCCACAG GTACCTGCAG 

(2) INFORMATION FOR SEQ ID NO: 2 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295 
CTGCCCTCAC GTAAGCGCCT 

(2) INFORMATION FOR SEQ ID NO: 2 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296 
GCTGTTGCAG GGTAATGTTG 

(2) INFORMATION FOR SEQ ID NO: 2 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297 
CATCAGACAG GTGCGTACA 

(2) INFORMATION FOR SEQ ID NO: 2 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298 
GGCTGGTGAG GAGGGGCTGA 

(2) INFORMATION FOR SEQ ID NO: 299: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299 
CGCTCTGTGG GTGAGCTTCA 

(2) INFORMATION FOR SEQ ID NO:300: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 300 
TGTGGAATAG CCCAATTACA 

(2) INFORMATION FOR SEQ ID NO: 301: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301 
AGGGTGCTGA GTGAGTAGTA 

(2) INFORMATION FOR SEQ ID NO: 302: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302 
TTCTTTTCAG GCCCTCGTGT 

(2) INFORMATION FOR SEQ ID NO: 303: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 303 
TGCTGACCCG GTATGGTGGT * 

(2) INFORMATION FOR SEQ ID NO: 304: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:304 
TTTGGTGCAG CCTGTGACTC 

(2) INFORMATION FOR SEQ ID NO: 305: 

( i ) SEQUENCE • CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 305 
CGCACACAAG GTCAGTGTTC 

(2) INFORMATION FOR SEQ ID NO: 3 06: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 306 
TCTTTCCCAG GTTACTCCTT 

(2) INFORMATION FOR SEQ ID NO: 307: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307 
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ATCAAAGACT GTAAGTAACC 

(2) INFORMATION FOR SEQ ID NO: 3 08: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 308 
TCTATTTCAG ATGCTGATTC 

(2) INFORMATION FOR SEQ ID NO: 309: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309 
AGTAGAACAA GTAAGTGCAG 

(2) INFORMATION FOR SEQ ID NO: 310: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 310 
TTTTCAAAAG GCCTCCAAAG 

(2) INFORMATION FOR SEQ ID NO: 311: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311 
GAGCCCTGAG GTAAGTTAAT 

(2) INFORMATION FOR SEQ ID NO:312: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) S TRANDEDNE S S : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 312 
GCTTTTTCAG ATACTACTAT 

(2) INFORMATION FOR SEQ ID NO: 3 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313 
TAACATGTTC AACTGTCTGT 

(2) INFORMATION FOR SEQ ID NO: 3 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 314 
TGTTATATGC ATTTATCTTC 

(2) INFORMATION FOR SEQ ID NO: 315: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 315 
GGTAAATGAG GTAAGTCCTG 

(2) INFORMATION FOR SEQ ID NO: 3 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



-180 



WO 99/37809 

(D) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:316: 
TCTTGTTAAG ATCGCTCTCT 

(2) INFORMATION FOR SEQ ID NO: 317: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317: 
CCTTGCCCAG GTTCTCTTAA 

(2) INFORMATION FOR SEQ ID NO: 318: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318 
GCAATCGCAC CTGCACACCC 

(2) INFORMATION FOR SEQ ID NO: 319: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 319 
ACTGCCCATT TCTGGTAAAG 

(2) INFORMATION FOR SEQ ID NO: 320: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 320 
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CCCCTAACAG ATCATGATTC 

(2) INFORMATION FOR SEQ ID NO: 321: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 321 
ACGTGCAATG GTAAGAGGGC 

(2) INFORMATION FOR SEQ ID NO:322: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322 
TGTTTTGCAG TTTCCAGTGG 

(2) INFORMATION FOR SEQ ID NO: 323: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 323 
AAGTGGAACG GTGACTCTCT 

(2) INFORMATION FOR SEQ ID NO: 324: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 324 
TCCTTCACAG GCCAGTGCAG 

(2) INFORMATION FOR SEQ ID NO: 325: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:325: 
GAACAAACTG GTGAGTAGTA 

(2) INFORMATION FOR SEQ ID NO: 326: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:326: 
TTTTTTGTAG AGCCTTCCAT 

(2) INFORMATION FOR SEQ ID NO: 327: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 327: 
AGCACAGTAG GTAACTAACT 

(2) INFORMATION FOR SEQ ID NO: 328: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 328: 
ATGGCCACAG ATTTGTTGGA 

(2) INFORMATION FOR SEQ ID NO: 329: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 329: 
CTTCCTGTTG GTAAGCTGTC 

(2) INFORMATION FOR SEQ ID NO: 330: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: 
TTCTCCTTAG CAGAGTCACC 

(2) INFORMATION FOR SEQ ID NO: 331: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi.) SEQUENCE DESCRIPTION: SEQ ID NO: 331 
AAAAAGCACA GTAAGTTGGC 

(2) INFORMATION FOR SEQ ID NO: 332: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332 
TTTTCATCAG ACCCGAGAGG 

(2) INFORMATION FOR SEQ ID NO: 33 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333 
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GAGCTATGAG GTGAGGAGTT 

(2) INFORMATION FOR SEQ ID NO: 334: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334 
TTTGTTACAG ATATTACTAC 

(2) INFORMATION FOR SEQ ID NO: 335: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 335 
AGCCTGGAAA TGCGTGTTTC 

(2) INFORMATION FOR SEQ ID NO:336: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336 
CGAGAATTCA CTCGAGCATC AGG 

(2) INFORMATION FOR SEQ ID NO: 337: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 337 
CCTGATGCTC GAGTGAATTC T 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 1. . .848 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:338: 

ATG ATT CTG GAA GGA AGT GGT GTA ATG AAT CTC AAC CCA GCC AAC AAC 48 
Met lie Leu Glu Gly Ser Gly Val Met Asn Leu Asn Pro Ala Asn Asn 
1 5 10 15 

CTC CTT CAC CAG CAA CCA GCC TGG CCG GAC AGC TAC CCC ACA TGC AAT 96 
Leu Leu His Gin Gin Pro Ala Trp Pro Asp Ser Tyr Pro Thr Cys Asn 

20 25 30 

GTT TCC AGC GGT TTT TTT GGA AGC CAG TGG CAT GAA ATC CAC CCT CAG 144 
Val Ser Ser Gly Phe Phe Gly Ser Gin Trp His Glu lie His Pro Gin 
35 40 45 

TAC TGG ACC AAA TAC CAG GTG TGG GAA TGG CTG CAG CAC CTC CTG GAC 192 
Tyr Trp Thr Lys Tyr Gin Val Trp Glu Trp Leu Gin His Leu Leu Asp 
50 55 60 

ACC AAC CAG CTA GAC GCT AGC TGC ATC CCT TTC CAG GAG TTC GAC ATT 240 
Thr Asn Gin Leu Asp Ala Ser Cys He Pro Phe Gin Glu Phe Asp He 
65 70 75 80 

AGC GGA GAA CAC CTG TGC AGC ATG AGT CTG CAG GAG TTC ACG AGG GCA 288 
Ser Gly Glu His Leu Cys Ser Met Ser Leu Gin Glu Phe Thr Arg Ala 

85 90 95 

GCA GGC TCA GCT GGG CAG CTG CTC TAC AGC AAC CTA CAG CAT CTC AAG 336 
Ala Gly Ser Ala Gly Gin Leu Leu Tyr Ser Asn Leu Gin His Leu Lys 

100 105 HO 

TGG AAC GGC CAA TGC AGC AGT GAC CTT TTC CAG TCC GCA CAC AAT GTC 384 
Trp Asn Gly Gin Cys Ser Ser Asp Leu Phe Gin Ser Ala His Asn Val 
115 120 125 

ATT GTC AAG ACT GAA CAA ACC GAT CCT TCC ATC ATG AAC ACA TGG AAA 432 
He Val Lys Thr Glu Gin Thr Asp Pro Ser He Met Asn Thr Trp Lys 
130 135 140 

GAA GAA AAC TAT CTC TAT GAT CCC AGC TAT GGT AGC ACA GTA GAT CTG 480 
Glu Glu Asn Tyr Leu Tyr Asp Pro Ser Tyr Gly Ser Thr Val Asp Leu 
145 150 155 160 
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TTG GAC AGT AAG ACT TTC TGC CGG GCT CAG ATC TCC ATG ACA.ACC TCC 528 
Leu Asp Ser Lys Thr Phe Cys Arg Ala Gin He Ser Met Thr Thr Ser 

165 170 175 

AGT CAC CTT CCA GTT GCA GAG TCA CCT GAT ATG AAA AAG GAG CAA GAC 576 
Ser His Leu Pro Val Ala Glu Ser Pro Asp Met Lys Lys Glu Gin Asp 

X80 185 190 

CAC CCT GTA AAG TCC CAC ACC AAA AAG CAC AAC CCA AGA GGC ACT CAC 624 
His Pro Val Lys Ser His Thr Lys Lys His Asn Pro Arg Gly Thr His 
195 200 205 

TTA TGG GAG TTC ATC CGA GAC ATT CTC TTG AGC CCA GAC AAG AAC CCA 672 
Leu Trp Glu Phe He Arg Asp He Leu Leu Ser Pro Asp Lys Asn Pro 
210 215 220 

GGG CTG ATC AAA TGG GAA GAC CGT TCG GAA GGC ATC TTC AGG TTC CTG 720 
Gly Leu He Lys Trp Glu Asp Arg Ser Glu Gly He Phe Arg Phe Leu 
225 230 235 240 

AAG TCA GAA GCT GTG GCT CAG CTG TGG GGG AAA AAG AAA AAT AAC AGT 768 
Lys Ser Glu Ala Val Ala Gin Leu Trp Gly Lys Lys Lys Asn Asn Ser 

245 250 255 

AGC ATG ACA TAC GAG AAG CTC AGC CGG GCT ATG AGA TAT TAC TAC AAA 816 
Ser Met Thr Tyr Glu Lys Leu Ser Arg Ala Met Arg Tyr Tyr Tyr Lys 

260 265 270 

CGA GAA ATC CTG GAA CGT GTG GAT GGA CGA CG 848 
Arg Glu He Leu Glu Arg Val Asp Gly Arg Arg 
275 280 



(2) INFORMATION FOR SEQ ID NO: 339: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 283 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: 
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Gly 


Ser Ala Gly Gin 


Leu Leu 


Tyr Ser 


Asn 


Leu 


Gin 


His 


Leu Lys 








100 








105 








110 




Trp 


Asn 


Gly Gin Cys 


Ser 


Ser Asp 


Leu Phe 


Gin 


Ser 


Ala 


His 


Asn Val 






115 








120 








125 






He 


Val 


Lys 


Thr 


Glu 


Gin 


Thr Asp 


Pro Ser 


He 


Met 


Asn 


Thr 


Trp Lys 




130 










135 






140 








Glu 


Glu 


Asn Tyr 


Leu 


Tyr 


Asp Pro 


Ser Tyr 


Gly 


Ser 


Thr 


Val 


Asp Leu 


145 










150 






155 








160 


Leu 


Asp 


Ser 


Lys 


Thr 


Phe 


Cys Arg 


Ala Gin 


He 


Ser 


Met 


Thr 


Thr Ser 










165 






170 










175 


Ser 


His 


Leu 


Pro 


Val 


Ala 


Glu Ser 


Pro Asp 


Met 


Lys 


Lys 


Glu Gin Asp 








180 








185 








190 




His 


Pro 


Val 


Lys 


Ser 


His 


Thr Lys 


Lys His 


Asn 


Pro 


Arg 


Gly Thr His 






195 








200 








205 






Leu 


Trp 


Glu 


Phe 


He 


Arg 


Asp He 


Leu Leu 


Ser 


Pro 


Asp 


Lys 


Asn Pro 




210 










215 






220 








Gly 
* 


Leu 


He 


Lys 


Trp 


Glu 


Asp Arg 


Ser Glu 


Gly 


He 


Phe 


Arg 


Phe Leu 


225 










230 






235 








240 


Lys 


Ser 


Glu 


Ala 


Val 


Ala 


Gin Leu Trp Gly 


Lys 


Lys 


Lys 


Asn 


Asn Ser 










245 






250 










255 


Ser 


Met 


Thr Tyr Glu Lys 


Leu Ser Arg Ala 


Met 


Arg 


Tyr 


Tyr 


Tyr Lys 








260 








265 








270 




Arg 


Glu 


He 


Leu Glu Arg 


Val Asp Gly Arg 


Arg 











275 280 



-188- 



WO 99/37809 PCT/US98/01260 

What is Claimed is: 

1 . An isolated nucleic acid molecule comprising a sequence within a 
mammalian ASTH1 locus, or a polymorphic variant thereof. 

5 2. An isolated nucleic acid molecule according to Claim 1 , wherein said 

nucleic acid molecule encodes an ASTH1 polypeptide. 

3. An isolated nucleic acid molecule according to Claim 1 wherein said 
nucleic acid comprises a promoter or regulatory region. 

10 

4. An isolated nucleic acid molecule according to Claim 1 comprising a 
probe for detection of an ASTH1 locus polymorphism. 

5. An array of oligonucleotides comprising: 
1 5 two or more probes according to Claim 4. 

6. An isolated nucleic acid comprising a microsatellite repeat associated 
with a predisposition to asthma. 

20 7. A nucleic acid according to any of claim 1 to 5, wherein said ASTH1 

locus is human. 

8. A cell comprising a nucleic acid composition according to any of 
claims 1 to 4. 

25 

9. A purified polypeptide composition comprising at least 50 weight % of 
the protein present as the product of the nucleic acid of Claim 1 . 

10. A method for detecting a predisposition to asthma in an individual, the 

30 method comprising: 

analyzing the genomic DNA or mRNA of said individual for the presence of at 
least one predisposing ASTH1 locus polymorphism or a sequence linked to a 
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predisposing polymorphism; wherein the presence of said predisposing 
polymorphism is indicative of an increased susceptibility to asthma. 



11. A method according to Claim 1 0, wherein said analyzing step 

5 comprises detection of specific binding between the genomic DNA or rnRNA of said 
individual with a probe or probes according to either of Claims 4 or 5. 

12. A method according to Claim 10, wherein said analyzing step 
comprises detection of specific binding between the genomic DNA or rnRNA of said 

10 individual with a microsatellite marker listed in Table 1 . 

13. A non-human transgenic animal model for ASTH1 gene function 

comprising one of: 

(a) a knockout of an ASTH1 gene; 
15 (b) an exogenous and stably transmitted mammalian ASTH1 gene 

sequence; or 

(c) an ASTH1 promoter sequence operably linked to a reporter gene. 

14. A method of screening for biologically active agents that modulate 
20 ASTH1 function, the method comprising: 

combining a candidate biologically active agent with any one of: 

(a) a mammalian ASTH1 polypeptide; 

(b) a cell comprising a nucleic acid encoding a mammalian ASTH1 
polypeptide; or 

25 (c) a non-human transgenic animal model for ASTH1 gene function 

comprising one of: (i) a knockout of an ASTH1 gene; (ii) an exogenous and stably 
transmitted mammalian ASTH1 gene sequence; or (iii) an ASTH1 promoter 
sequence operably linked to a reporter gene; and 

determining the effect of said agent on ASTH1 function. 

30 
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1 5. An isolated nucleic acid that hybridizes under stringent conditions to 
any one of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID 
NO:6, SEQ ID NO:8, SEQ ID NO:10, or SEQ ID NO:328. 

5 16. An isolated nucleic acid that encodes a polypeptide or fragment 

thereof having an amino acid sequence substantially identical to the sequence as 
set forth within any one of SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID 
NO:11,orSEQ ID NO:339. 



-191- 



WO 99/37809 



1/1 



PCI7US98/01260 



CO 



W 
O 

H 
» 

CO 



H 
H 

a 

CO 

pa 
x 

fa 

o 

D 

§ 

Eh 

CO 

O 
H 

o 
w 



O 

H 

fa 




on 



EZZZ) 



-EZZ23 



x 

< 



O 

z 

a 
»-* 

o 

CO 



^> r~)- 5 



•o r 



J 
1 



0> 



I 



\ 



CO 

< 



CO 



S 




I § 



8 



O 



© • 0 (DEZZ2 D 



CD 
CP 



s 



c 
o 

X 
CD 



a 
a 



© 
c 



^ © 

a> E 
to 

c T> 

o c 

o CO 

to c 

CO j? 

§ © 

<D 0) 
COD 




CJO 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US98/01260 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC(6) :C12Q 1/68 
US CL :435/6 

Accofding to International Patent Classification (IPC) or to both national classification and IPC 



FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 435/6 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 
NONE 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
NONE 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



SANDFORD et al. Localisation of atopy and (3 subunit of high- 
affinity IgE receptor (FceRl) on chromosome 1 lq. The Lancet. 06 
February 1993, Vol. 341, pages 332-334, see entire article. 

GERHARD et al. Isolation of 1001 New Markers from Human 
Chromosome 11, Excluding the Region of Ilpl3-pl5.5, and Their 
Sublocalization by a New Series of Radiation-Reduced Somatic Cell 
Hybrids. Genomics. 1992, Vol. 13, pages 1133-1142, see entire 
article. 



1-14 



1-14 



•A* 



Further documents are listed in the continuation of Box C. 

Special categories of cited documents: 

document defining the general state of the art which is not considered 
to be of particular relevance 

earlier document published on or after the international filing date 

document which may throw doubts on priority claixn(s) or which is 
cited to establish the publication date of another citation or other 
special reason (as specified) 

document referring to an oral disclosure, use, exhibition or other 
means 

document published prior to the international filing date but later than 
the priority date claimed 



•X* 



See patent family annex. 

later document published after the inte rn a t iona l filing date or priority 
date and not in conflict with the application but cited to understand 
the principle or theory underlying the invention 

document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to involve an inventive step 
when the document is taken alone 

document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such docu m e nt s, such combination 
being obvious to a person skilled in the art 

document member of the same patent family 



Date of the actual completion of the international search 



13 APRIL 1998 



Name and mailing address of the ISA/US 
Commissioner of Patents and Trademarks 
Box PCT 

Washington, D.C 20231 
Facsimile No. (703) 305-3230 



Date of mailing of the international search report 

1 6 MAY 139*8 



Authorized officer 

EGGERTON CAMPBELL 
Telephone No. (703) 308-0 196 



0. 



Form PCT/ISA/210 (second sheet* July 1992)* 



